Category: AG Quiz


  • This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which Airflow component defines dependencies? A) Task  B) DAG  C) Operator  D) Hook Correct Answer: B Explanation A DAG (Directed Acyclic Graph) is the core building block of Apache Airflow that defines the entire workflow including all tasks and their dependencies. A DAG is an acyclic…

  • This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which of these systems is a batch engine? A) Apache Spark  B) Kafka  C) Flink  D) RabbitMQ Correct Answer: A Explanation Apache Spark is primarily a batch processing engine that has become the de facto standard for large-scale data processing in distributed environments. Unlike streaming systems, Spark processes data in…

  • This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: In data pipelines, an “idempotent operation” behaves as: A) Repetition changes the result  B) Repetition has no effect  C) It speeds up processing  D) It requires rollback Correct Answer: B Explanation An idempotent operation in data pipelines means that repeated execution produces the same final result without any further changes. This is critical in distributed systems where duplicate execution or…

  • This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which principle reduces costs in big data processing? A) Denormalization  B) Partitioning  C) Sharding  D) All of the above Correct Answer: D Explanation All three principles—denormalization, partitioning, and sharding—are effective strategies for reducing costs in big data processing. DENORMALIZATION reduces the number of JOIN operations by combining data from multiple tables into…

  • This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which format is most efficient for big data processing? A) CSV  B) JSON  C) Parquet  D) XML Correct Answer: C Explanation Apache Parquet is a columnar storage format designed specifically for analytical workloads and big data processing. Unlike row-based formats such as CSV, Parquet stores data by columns, which enables…

  • This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which tool is typically used for ETL? A) Airflow  B) Grafana  C) Superset  D) Looker Correct Answer: A Explanation Apache Airflow is an open-source workflow orchestration platform that has become the de facto standard for managing ETL and data pipeline processes. Airflow enables defining, scheduling, and monitoring complex…

  • AgQuiz #10 – ETL

    This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: In an ETL process, the staging area is intended for: A) Archive  B) Temporary processing  C) Analysis  D) Visualization Correct Answer: B Explanation The staging area in an ETL process is a temporary storage layer where data from various sources is loaded before being transformed and moved to the target data warehouse.…

  • AgQuiz #9 – ETL

    This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: ETL stands for: A) Extract, Transform, Load  B) Export, Transfer, Load  C) Extract, Transfer, Link  D) Encode, Transform, Load Correct Answer: A Explanation ETL (Extract, Transform, Load) is a fundamental process in data engineering and business intelligence that describes three key phases of data processing. The EXTRACT phase retrieves data from various…

  • AgQuiz #8 – ACID

    This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which method ensures ACID properties in a database? A) Indexing  B) Transactions  C) Cache  D) Sharding Correct Answer: B Explanation Transactions are the core mechanism that ensures ACID properties in databases. ACID stands for Atomicity, Consistency, Isolation, and Durability. Atomicity guarantees that a transaction is completed fully…

  • This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new. Today Question: Which database is NoSQL? A) MySQL  B) PostgreSQL  C) MongoDB  D) SQLite Correct Answer: C Explanation MongoDB is one of the most popular NoSQL document databases that stores data in BSON (a binary form of JSON) documents instead of relational tables. NoSQL databases…