1. Spark vs. Hadoop1.1 Disadvantages of Haoop
1.2 Advantages over Hadoop MR
2. Spark Ecosystem2.1 Three Types of Big Data Processing 1. Complex batch data processingThe time span is from tens of minutes to several hours Haoop MapReduce 2. Interactive query based on historical dataThe time span is from tens of seconds to several minutes The real-time performance of Cloudera and Impala is better than that of Hive. 3. Data processing based on real-time data streamThe time span is from hundreds of milliseconds to several seconds Storm 2.2 BDAS Architecture2.3 Spark Ecosystem3. Basic concepts and architecture design3.1 Basic Concepts3.2 Operational ArchitectureAdvantages of Spark using Executor: (Compared to Hadoop's MR)
3.3 Relationships between various concepts
When executing an Application, the Driver will request resources from the cluster manager and start the Executor. And send the application code and files to the Executor, and then execute the Task on the Executor. After the run is completed, The execution results will be returned to the Driver or written to HDFS or other databases. 4. Spark runs the basic process4.1 Operation Process1. Build a basic operating environment for the application. That is, the Driver creates a SparkContext to apply for resources, allocate tasks, and monitor them. 2. The Resource Manager allocates resources to the Executor and starts the Executor process.
4. The Task runs on the Executor and feeds back the execution results to the TaskScheduler, and then to the DAGScheduler. After the execution is completed, the data is written and all resources are released. 4.2 Operational Architecture Features1. Each Application has its own Executor process, and the process remains resident while the Application is running. The Executor process runs Task in a multi-threaded manner. 2. The Spark running process has nothing to do with the resource manager, as long as it can obtain the Executor process and maintain communication. 3. Task uses optimization mechanisms such as data locality and speculative execution. (Computation moves closer to data.) 5. Spark deployment and application methods5.1 Three deployment methods of Spark5.1.1 StandaloneSimilar to MR1.0, slot is the resource allocation unit, but the performance is not good. 5.1.2 Spark on MesosMesos and Spark have a certain affinity. 5.1.3 Spark on YARNThe connection between Mesos and Yarn 5.2 From Hadoop+Storm Architecture to Spark ArchitectureHadoop+Storm architectureThis deployment method is more complicated. Using Spark architecture to meet batch and stream processing needsSpark uses fast small batch computing to simulate stream computing, but it is not real stream computing. It is impossible to achieve millisecond-level stream computing. For enterprise applications that require millisecond-level real-time response, stream computing frameworks such as Storm are still needed. Advantages of Spark architecture:
5.3 Unified Deployment of Hadoop and SparkDifferent computing frameworks run uniformly in YARNThe benefits are as follows:
status quo: 1. Spark cannot currently replace the functions implemented by some components in the Hadoop ecosystem. 2. It costs a certain amount of money to completely migrate existing applications developed with Hadoop components to Spark. This is the end of this article about Spark introduction and comparison analysis with Hadoop. For more relevant Spark and Hadoop content, please search 123WORDPRESS.COM’s previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: CSS3 gradient background compatibility issues
>>: Using js to realize dynamic background
1: readonly is to lock this control so that it can...
Vue first screen performance optimization compone...
Table of contents 1. Introduction 1.1 Babel Trans...
Preface When we installed the system, we did not ...
Table of contents 1. Function Binding 2. With par...
Three tables are connected. Field a of table A co...
Table of contents 1. Parent component passes valu...
Table of contents No switch, no complex code bloc...
Linux remote deployment of MySQL database, for yo...
Today, let’s discuss an interesting topic: How mu...
Everyone must be familiar with table. We often en...
Table of contents Step 1: Log in as root user. St...
Recently I was looking at how Docker allows conta...
echarts word cloud is an extension of echarts htt...
Locks in MySQL Locks are a means to resolve resou...