This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which Airflow component defines dependencies? A) Task B) DAG C) Operator D) Hook Correct Answer: B Explanation A DAG (Directed Acyclic Graph) is the core building block of Apache Airflow that defines the entire workflow including all tasks and their dependencies. A DAG is an acyclic…
This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which of these systems is a batch engine? A) Apache Spark B) Kafka C) Flink D) RabbitMQ Correct Answer: A Explanation Apache Spark is primarily a batch processing engine that has become the de facto standard for large-scale data processing in distributed environments. Unlike streaming systems, Spark processes data in…
This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: In data pipelines, an “idempotent operation” behaves as: A) Repetition changes the result B) Repetition has no effect C) It speeds up processing D) It requires rollback Correct Answer: B Explanation An idempotent operation in data pipelines means that repeated execution produces the same final result without any further changes. This is critical in distributed systems where duplicate execution or…
This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which principle reduces costs in big data processing? A) Denormalization B) Partitioning C) Sharding D) All of the above Correct Answer: D Explanation All three principles—denormalization, partitioning, and sharding—are effective strategies for reducing costs in big data processing. DENORMALIZATION reduces the number of JOIN operations by combining data from multiple tables into…
This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which format is most efficient for big data processing? A) CSV B) JSON C) Parquet D) XML Correct Answer: C Explanation Apache Parquet is a columnar storage format designed specifically for analytical workloads and big data processing. Unlike row-based formats such as CSV, Parquet stores data by columns, which enables…
This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which tool is typically used for ETL? A) Airflow B) Grafana C) Superset D) Looker Correct Answer: A Explanation Apache Airflow is an open-source workflow orchestration platform that has become the de facto standard for managing ETL and data pipeline processes. Airflow enables defining, scheduling, and monitoring complex…
This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: In an ETL process, the staging area is intended for: A) Archive B) Temporary processing C) Analysis D) Visualization Correct Answer: B Explanation The staging area in an ETL process is a temporary storage layer where data from various sources is loaded before being transformed and moved to the target data warehouse.…
This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: ETL stands for: A) Extract, Transform, Load B) Export, Transfer, Load C) Extract, Transfer, Link D) Encode, Transform, Load Correct Answer: A Explanation ETL (Extract, Transform, Load) is a fundamental process in data engineering and business intelligence that describes three key phases of data processing. The EXTRACT phase retrieves data from various…
This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which method ensures ACID properties in a database? A) Indexing B) Transactions C) Cache D) Sharding Correct Answer: B Explanation Transactions are the core mechanism that ensures ACID properties in databases. ACID stands for Atomicity, Consistency, Isolation, and Durability. Atomicity guarantees that a transaction is completed fully…
This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which database is NoSQL? A) MySQL B) PostgreSQL C) MongoDB D) SQLite Correct Answer: C Explanation MongoDB is one of the most popular NoSQL document databases that stores data in BSON (a binary form of JSON) documents instead of relational tables. NoSQL databases…