HDFS, Big Data Distributed File System

HDFS stands for Hadoop Distributed File System. It is meant to store large amount of data sets with a high level of reliability, distributing these data sets to user applications at high bandwidth.

HDFS is the Hadoop components to distribute data among multiple servers. The number of servers can be high, up to several thousands, for a data volume of up to hundreds of terabytes. The great strength of HDFS is it ability to not only detect but also anticipate potential failures of these servers. The fault tolerance provided in HDFS is made possible thanks to its ability to relaunch a node or to assign a task to another node.

HDFS Architecture Schema:

HDFS architecture
HDFS Architecture in Hadoop

Each server consolidation, called cluster, is associated with a single NameNode that provides naming each machine and saves the queries data access. This NameNode provides all consulting operations. Storage spaces are called Datanode in which are stored the database files. The DataNodes support read and write data operations, along with create, delete and replicate operations.

See also: Plate-forme HadoopMapReduceBig Data Job Postings