Hadoop Core

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Pros & Cons

PROS

  • Scalability – By adding nodes we can easily grow our system to handle more data.

  • Flexibility – In this framework, you don’t have to preprocess data before storing it. You can store as much data as you want and decide how to use later.

  • Fault tolerance – If nodes go down, then jobs are automatically redirected to other nodes.

  • Computing power – Its distributed computing model processes big data fast. The more computing nodes you use more processing power you have.

CONS

  • Security concerns - It can be challenging in managing the complex application. If the user doesn’t know how to enable platform who is managing the platform, then your data could be at a huge risk. Since, storage and network levels Hadoop are missing encryption, which is a major point of concern.

  • Not fit for small data – Since, it is not suited for small data. Hence, it lacks the ability to efficiently support the random reading of small files.

  • Potential stability issues – As it is an open source framework. This means that it is created by many developers who continue to work on the project. When improvements are being made constantly, It has stability issues. To avoid these issues organizations should run on the latest stable version.

Last updated