1. Spark vs. Hadoop1.1 Disadvantages of Haoop
1.2 Advantages over Hadoop MR
2. Spark Ecosystem2.1 Three Types of Big Data Processing 1. Complex batch data processingThe time span is from tens of minutes to several hours Haoop MapReduce 2. Interactive query based on historical dataThe time span is from tens of seconds to several minutes The real-time performance of Cloudera and Impala is better than that of Hive. 3. Data processing based on real-time data streamThe time span is from hundreds of milliseconds to several seconds Storm 2.2 BDAS Architecture2.3 Spark Ecosystem3. Basic concepts and architecture design3.1 Basic Concepts3.2 Operational ArchitectureAdvantages of Spark using Executor: (Compared to Hadoop's MR)
3.3 Relationships between various concepts
When executing an Application, the Driver will request resources from the cluster manager and start the Executor. And send the application code and files to the Executor, and then execute the Task on the Executor. After the run is completed, The execution results will be returned to the Driver or written to HDFS or other databases. 4. Spark runs the basic process4.1 Operation Process1. Build a basic operating environment for the application. That is, the Driver creates a SparkContext to apply for resources, allocate tasks, and monitor them. 2. The Resource Manager allocates resources to the Executor and starts the Executor process.
4. The Task runs on the Executor and feeds back the execution results to the TaskScheduler, and then to the DAGScheduler. After the execution is completed, the data is written and all resources are released. 4.2 Operational Architecture Features1. Each Application has its own Executor process, and the process remains resident while the Application is running. The Executor process runs Task in a multi-threaded manner. 2. The Spark running process has nothing to do with the resource manager, as long as it can obtain the Executor process and maintain communication. 3. Task uses optimization mechanisms such as data locality and speculative execution. (Computation moves closer to data.) 5. Spark deployment and application methods5.1 Three deployment methods of Spark5.1.1 StandaloneSimilar to MR1.0, slot is the resource allocation unit, but the performance is not good. 5.1.2 Spark on MesosMesos and Spark have a certain affinity. 5.1.3 Spark on YARNThe connection between Mesos and Yarn 5.2 From Hadoop+Storm Architecture to Spark ArchitectureHadoop+Storm architectureThis deployment method is more complicated. Using Spark architecture to meet batch and stream processing needsSpark uses fast small batch computing to simulate stream computing, but it is not real stream computing. It is impossible to achieve millisecond-level stream computing. For enterprise applications that require millisecond-level real-time response, stream computing frameworks such as Storm are still needed. Advantages of Spark architecture:
5.3 Unified Deployment of Hadoop and SparkDifferent computing frameworks run uniformly in YARNThe benefits are as follows:
status quo: 1. Spark cannot currently replace the functions implemented by some components in the Hadoop ecosystem. 2. It costs a certain amount of money to completely migrate existing applications developed with Hadoop components to Spark. This is the end of this article about Spark introduction and comparison analysis with Hadoop. For more relevant Spark and Hadoop content, please search 123WORDPRESS.COM’s previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: CSS3 gradient background compatibility issues
>>: Using js to realize dynamic background
1. parseFloat() function Make a simple calculator...
Origin of the problem When using docker, I unfort...
Table of contents Require Implementation Code dat...
Table of contents 0. The kernel tree that comes w...
This article shares the installation tutorial of ...
Tab selection cards are used very frequently on r...
Table of contents 1. Home Page Production 1. Prod...
1. What is scaffolding? 1. Vue CLI Vue CLI is a c...
Everyone is familiar with the meta tag in desktop...
Lottie is an open source animation library for iO...
This article example shares the specific code of ...
Table of contents 1. Block scope 1.1. let replace...
Today I learned a new CSS special effect, the wav...
Vue2+elementui's hover prompts are divided in...
Table of contents What is JSONP JSONP Principle J...