sparkApache Spark is a unified engine for large-scale data processing, with an interface for programming clusters with implicit data parallelism and fault tolerance. It supports variousSpark’s primary abstraction is a distributed collection of items called a Dataset. Datasets can be created from Hadoop InputFormats (such as HDFS files) or by transforming other Datasets.