Spark, The New Big Data Framework

Spark is an open source framework for the realization of distributed computing. Spark, Spark or Apache, was the & rsquo; originally a project developed by AMPLab Berkeley. Spark then became a full project of the Apache Foundation.

If Hadoop uses the MapReduce architecture model, Spark works directly in memory, which makes it potentially much more efficient (treatment up to a hundred times faster). There’s an increasing number od Spark adopters around the world for the same reason of better performance.

Spark needs a cluster manager or Spark itself or Apache Mesos or Hadoop Yarn. Spark also needs a distributed storage system such as HDFS (Hadoop Distributed File System) < / a>, Cassandra, Amazon S3 or OpenStack Swift.

See also:
HadoopHDFS ArchitectureMapReduceBig Data Jobs