Hadoop, The Big Data Platform

Hadoop Big Data is an open source platform (Apache Foundation) which imposes itself as a quasi-standard in the market for massive solutions. Hadoop was designed to manage considerable volumes of data, both structured and unstructured, in a distributed environment.

This is Google that is behind this hadoop platform. The work of Google on that platform were then pursued in an Open Source framework. Yahoo! has also done much to develop this platform into a complete development environment for business.

A diagram showing the various components of the Hadoop platform:

Hadoop par lebigdata.com
Hadoop Big Data Platform

Hadoop aims to manage large volumes of heterogeneous data, consisting of structured and unstructured data. This platform is capable of performing complex calculations, including statistics on large amounts of data. They answered eg the needs of web companies to generate strong market trends from the research conducted by Internet users.

Hadoop can perform its processing of data in a distributed architecture on many remote servers using the storage capacity and calculating each of them, and this is one of its main strengths: the combination of clustered servers increase its processing capacity. Hadoop provides high data availability and knows mitigate potential failures of servers across a data replication system. Brick MapReduce in Hadoop’s role is to steer the répartion treatments on servers, and retrieve results in a condensed form.

See also: HDFS ArchitectureMapReduceJobs