This is a regular “data quiz”. Follow it on LinkedIn. Test your knowledge or learn something new.

Today Question:

Which of these systems is a batch engine?

A) Apache Spark 

B) Kafka 

C) Flink 

D) RabbitMQ


Correct Answer: A

Explanation

Apache Spark is primarily a batch processing engine that has become the de facto standard for large-scale data processing in distributed environments. Unlike streaming systems, Spark processes data in batches by loading the entire dataset into memory, performing transformations, and writing results. Spark excels in complex analytical workloads, machine learning, ETL processes, and reporting. Its key features include in-memory computation (RDD and DataFrames), fault tolerance, lazy evaluation, and broad language support (Scala, Python, Java, R). Kafka is a streaming platform and message broker, not a processing engine. Flink is hybrid but primarily a stream processing engine. RabbitMQ is a message queue. Spark also includes Spark Streaming, but its main strength remains batch processing of large datasets, leveraging parallelism and distributed computation to execute complex analytical tasks more efficiently than traditional MapReduce.