Big Data
Scalable Systems
Concepts
Big Data Fundamentals
Understand what "big data" means, its key characteristics, and common technologies used to process massive datasets.
The 4Vs of Big Data
- Volume: large amounts of data (GB, TB, PB).
- Velocity: speed at which data is generated (streams, real-time).
- Variety: different types (structured, semi-structured, unstructured).
- Veracity: data quality and reliability.
Big Data Tools
Hadoop
Distributed storage (HDFS) and batch processing (MapReduce, now less popular than Spark).
Apache Spark
Fast general engine for big data processing; supports batch, streaming, SQL, ML, graph.
Cloud Services
AWS EMR, GCP Dataproc, Azure HDInsight, BigQuery, Snowflake, etc.