Big Data Architecture

A Big Data infrastructure is mostly composed of three elements (or their equivalents on the market): Hadoop, HDFS and MapReduce. These three essential components are designed to enable the collection, storage, analysis and consolidation of large volumes of heterogeneous data, consisting of structured and unstructured data.


The Hadoop solution has rapidly become the core solution of the Big Data market. This is an open source framework for distributed architecture. Developments of Hadoop framework (in Java) are insured by the Apache Foundation. Just as the distributed file system HDFS (hadoop Distributed File System).

Hadoop supports several types of distributed DBMS (DBMS NoSQL), first HDFS and HBase, but it also supports other database management systems such as NoSQL Cassandra example.
Read more about Hadoop


Hadoop HDFS
HDFS provides distribution data on the servers, whose number can be very high. HDFS is able to compensate for the failures of some servers and ensure no performance hit the continuous processing.
Read more about sur HDFS


Hadoop MapReduce
MapReduce is an architecture model capable of performing complex calculations and distributed on large distributed data volumes.

Read more about sur MapReduce

See also: Hadoop PlatformHDFS ArchitectureMapReduceData Science Jobs