2. HDFS

2.1

NameNode

It works as Master in the cluster. Namenode stores meta-data. A number of blocks, replicas and other details. Meta-data is present in memory in the master. NameNode maintains and also manages the slave nodes, and assigns tasks to them. It should deploy on reliable hardware as it is the centerpiece of HDFS.

DataNode

It works as Slave in the cluster. In HDFS, DataNode is responsible for storing actual data in HDFS. DataNode performs read and write operation as per request for the clients. DataNodes can also deploy on commodity hardware.

fsimage/edit log/metadata

  • fsimage = paths + block ids + user + group + permission (written in memory)

  • edit log = operations (written in disk)

  • metadata = fs image + edit log

Secondary NameNode

Performs memory-intensive administrative functions for the NameNode.

  • NameNode writes metadata changes to an editlog

  • Secondary NameNode periodically combines a prior filesystem snapshot and editlog into a new snapshot

  • New snapshot is transmitted back to the NameNode

  • Secondary NameNode should run on a separate machine in a large installation

  • It requires as much RAM as the NameNode

Standby NameNode

The standby NameNode serves the purpose of a backup NameNode which incorporate failover capabilities to the Hadoop cluster. (see 2.2 for details)

2.2

Architecture

The HA architecture solved the problem of NameNode availability by allowing us to have two NameNodes in an active/passive configuration. So, we have two running NameNodes at the same time in a High Availability cluster:

  • Active NameNode

  • Standby/Passive NameNode.

With the StandbyNode, we can have automatic failover whenever a NameNode crashes (unplanned event) or we can have a graceful (manually initiated) failover during the maintenance period.

JournalNodes. The standby NameNode and the active NameNode keep in sync with each other through a separate group of nodes or daemons called JournalNodes. The active NameNode is responsible for updating the EditLogs (metadata information) present in the JournalNodes. The StandbyNode reads the changes made to the EditLogs in the JournalNode and applies it to its own namespace in a constant manner.

HA through NFS

HA through NFS - The StandbyNode and the active NameNode keep in sync with each other by using a shared storage device. The active NameNode logs the record of any modification done in its namespace to an EditLog present in this shared storage. The StandbyNode reads the changes made to the EditLogs in this shared storage and applies it to its own namespace.

HA through QJM

Quorum journal manager (QJM) in the NameNode writes file system journal logs to the journal nodes. A journal log is considered successfully written only when it is written to majority of the journal nodes. Only one of the Namenodes can achieve this quorum write. In the event of split-brain scenario this ensure that the file system metadata will not be corrupted by two active NameNodes.

2.3

Architecture

In Hadoop, HDFS splits huge files into small chunks known as data blocks. HDFS Data blocks are the smallest unit of data in a filesystem. We (client and admin) do not have any control over the data block like block location. Namenode decides all such things.

HDFS stores each file as a data block. However, the data block size in HDFS is very large. The default size of the HDFS block is 128MB which you can configure as per your requirement. All blocks of the file are the same size except the last block, which can be either the same size or smaller.

The files are split into 128 MB blocks and then stored into the Hadoop file system.

Small files problem

Since the metadata is stored in NameNode’s RAM and each entry for a file (with its block locations) takes some space, a large number of small files will result in a lot of entries and take up more RAM than a small number of entries for large files. Also, files smaller than the block size will still be mapped to a single block, reserving space they don’t need; that’s the reason it’s preferable to use HDFS for large files instead of small files.

Solutions:

  • HDFS federation -- That is being shipped as part of HDP 3.0, allows many names nodes to work against a set of Datanodes.

  • sequence files

  • Hadoop Archives (do not compress!!)

2.4

data locality.

  • tasks (Map, Reduce) are copied to nodes not data is copied to tasks

  • YARN utilizes data locality framework knows where to copy tasks

  • minimize network I/O tasks read from disk not from network

Checkpoint Node

New implementation of the Secondary NameNode to solve the drawbacks of Secondary NameNode. Main function of the Checkpoint Node in hadoop is to create periodic checkpoints of file system metadata by merging edits file with fsimage file. Usually the new fsimage from merge operation is called as a checkpoint.

BackUp Node

Backup Node in hadoop is an extended checkpoint node that performs checkpointing and also supports online streaming of file system edits.

Last updated