Hadoop 2.x vs 3.x 22-point comparison, Hadoop 3.x improvements over 2.x

Hadoop 2.x vs 3.x 22-point comparison, Hadoop 3.x improvements over 2.x

Question Guide
1. How does Hadoop 3.x tolerate faults?
2. How much has Hadoop 3.x reduced storage overhead?
3.Is Hadoop 3.x MR API compatible with Hadoop 1.x?

1. Purpose

In this post, we will discuss the comparison between Hadoop 2.x and Hadoop 3.x. What new features are added in Hadoop 3 version, what are the compatible Hadoop 2 programs in Hadoop 3, what is the difference between Hadoop 2 and Hadoop 3?

2. Comparison between Hadoop 2.x and Hadoop 3.x

This section will describe 22 differences between Hadoop 2.x and Hadoop 3.x. Now let's discuss each one

2.1License

Hadoop 2.x - Apache 2.0, open source
Hadoop 3.x - Apache 2.0, open source

2.2 Minimum supported Java version

Hadoop 2.x - the minimum supported version of java is java 7
Hadoop 3.x - the minimum supported version of Java is Java 8

2.3 Fault Tolerance

Hadoop 2.x - can handle fault tolerance by replication (wasting space).
Hadoop 3.x - can handle fault tolerance through Erasure coding.

2.4 Data Balancing

Hadoop 2.x − For data balancing use HDFS balancer.
Hadoop 3.x − For data balancing use Intra-data node balancer which is called through HDFS disk balancer CLI.

2.5 Storage Scheme

Hadoop 2.x - Use 3X replica scheme
Hadoop 3.x − Supports erasure coding in HDFS.

2.6 Storage Overhead

Hadoop 2.x - HDFS has 200% overhead in storage space.
Hadoop 3.x - Storage overhead is only 50%.

2.7 Storage Overhead Example

Hadoop 2.x - If there are 6 blocks, then due to the replication scheme, there will be 18 blocks occupying the space.
Hadoop 3.x - If there are 6 blocks, then space is 9 blocks, 6 blocks of space and 3 blocks are used for parity.

2.8YARN Timeline Service

Hadoop 2.x - Uses the old Timeline service which has scalability issues.
Hadoop 3.x - Improves Timeline Service v2 and improves the scalability and reliability of the Timeline Service.

2.9 Default Port Range

Hadoop 2.x − In Hadoop 2.0, some of the default ports are in the Linux ephemeral port range. So at startup, they will not be able to bind.
Hadoop 3.x − But in Hadoop 3.0, these ports have been moved out of the ephemeral range.

2.10 Tools

Hadoop 2.x − Use Hive, pig, Tez, Hama, Giraph, and other Hadoop tools.
Hadoop 3.x − You can use Hive, pig, Tez, Hama, Giraph, and other Hadoop tools.

2.11 compatible file systems

Hadoop 2.x − HDFS (Default FS), FTP File System: It stores all the data on a remotely accessible FTP server. Amazon S3 (Simple Storage Service) file system Windows Azure Storage Blob (WASB) file system.
Hadoop 3.x − It supports all the previous as well as Microsoft Azure Data Lake File System.

2.12Datanode Resources

Hadoop 2.x − Datanode resources are not dedicated to MapReduce, we can use it for other applications.
Hadoop 3.x − Here data node resources can be used for other applications as well.

2.13MR API Compatibility

Hadoop 2.x - MR API compatible with Hadoop 1.x programs, executable on Hadoop 2.X
Hadoop 3.x − Here, MR API is made compatible with running Hadoop 1.x programs to execute on Hadoop 3.X

2.14 Support for Microsoft Windows

Hadoop 2.x − It can be deployed on Windows.
Hadoop 3.x − It supports Windows as well.

2.15 Slots/Containers

Hadoop 2.x − Hadoop 1 worked on the concept of slots, but Hadoop 2.X works on the concept of containers. Through containers, we can run common tasks.
Hadoop 3.x − It also works with the concept of containers.

2.16 Single Point of Failure

Hadoop 2.x − Has the feature of SPOF, so whenever Namenode fails, it automatically recovers.
Hadoop 3.x − Has the feature of SPOF, so whenever Namenode fails, it automatically recovers and no human intervention is required to overcome it.

2.17 HDFS Alliance

Hadoop 2.x − In Hadoop 1.0, there was only one NameNode to manage all Namespaces, but in Hadoop 2.0, multiple NameNodes are used for multiple Namespaces.
Hadoop 3.x − Hadoop 3.x also has multiple namespaces for multiple namespaces.

2.18 Scalability

Hadoop 2.x - We can scale up to 10,000 nodes per cluster.
Hadoop 3.x - Better scalability. We can scale to over 10,000 nodes per cluster.

2.19 Faster access to data

Hadoop 2.x − We can access data quickly due to data node cache.
Hadoop 3.x - Here also through Datanode cache we can access data quickly.

2.20HDFS Snapshot

Hadoop 2.x − Hadoop 2 added support for snapshots. It provides disaster recovery and protection against user errors.
Hadoop 3.x - Hadoop 2 also supports snapshot functionality.

2.21 Platform

Hadoop 2.x - can be used as a platform for various data analytics, running event processing, streaming, and real-time operations.
Hadoop 3.x - Here too event processing, streaming, and real-time operations can be run on top of YARN.

2.22 Cluster Resource Management

Hadoop 2.x − For cluster resource management, it uses YARN. It improves scalability, high availability, and multi-tenancy.
Hadoop 3.x − For cluster, resource management uses YARN with all the features.

Improvements of hadoop3.X over hadoop2.x

Common major improvements:
Shell script rewrite
Deprecated API removal

HDFS improvements:
Support erasure encoding
Support more than two namenodes
Data Balance
Multiple service ports have changed

Yarn improvements:
YARN Timeline Service v.2
Support for Opportunistic Containers and Distributed Scheduling

MapRduece improvements:
MapReduce task-level native optimization
Reworked daemon and task heap management

Other new features:
Shared client jars

Conclusion

As we have discussed 22 important differences between Hadoop 2.x and Hadoop 3.x and the improvements in 3.x, now we can see which one is better, Hadoop 2 or Hadoop 3.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. If you want to learn more about this, please check out the following links

You may also be interested in:
  • Hadoop NameNode Federation
  • Explanation of the new feature of Hadoop 2.X, the recycle bin function
  • Application of Hadoop counters and data cleaning
  • A practical tutorial on building a fully distributed Hadoop environment under Ubuntu 16.4
  • How to build a Hadoop cluster environment with ubuntu docker
  • Detailed steps to build Hadoop in CentOS
  • Hadoop wordcount example code
  • Java/Web calls Hadoop for MapReduce sample code
  • Explanation of the working mechanism of namenode and secondarynamenode in Hadoop

<<:  Detailed tutorial for installing MySQL on Linux

>>:  Solution to Element-ui upload file upload restriction

Recommend

Sample code for implementing a background gradient button using div+css3

As the demand for front-end pages continues to in...

Analysis of several situations where MySQL index fails

1. Best left prefix principle - If multiple colum...

MySQL Workbench download and use tutorial detailed explanation

1. Download MySQL Workbench Workbench is a graphi...

How to make JavaScript sleep or wait

Table of contents Overview Checking setTimeout() ...

Application and implementation of data cache mechanism for small programs

Mini Program Data Cache Related Knowledge Data ca...

js code that associates the button with the enter key

Copy code The code is as follows: <html> &l...

mysql 5.7.18 winx64 free installation configuration method

1. Download 2. Decompression 3. Add the path envi...

How to underline the a tag and change the color before and after clicking

Copy code The code is as follows: a:link { font-s...

A brief analysis of adding listener events when value changes in html input

The effect to be achieved In many cases, we will ...

Briefly describe the difference between MySQL and Oracle

1. Oracle is a large database while MySQL is a sm...

VUE implements a Flappy Bird game sample code

Flappy Bird is a very simple little game that eve...

Javascript to achieve drumming effect

This article shares the specific code of Javascri...

Detailed steps to build the TypeScript environment and deploy it to VSCode

Table of contents TypeScript environment construc...

Detailed explanation of common usage methods of weixin-js-sdk in vue

Link: https://qydev.weixin.qq.com/wiki/index.php?...

How to use css variables in JS

How to use css variables in JS Use the :export ke...