1. Flink Overview1.1 Basic IntroductionThe main features include: batch and stream integration, precise state management, event time support, and exactly-once state consistency guarantee. Flink can not only run on a variety of resource management frameworks including YARN, Mesos, and Kubernetes, but also supports independent deployment on bare metal clusters. With the high availability option enabled, there is no single point of failure. There are two concepts to explain here:
1.2 Application ScenariosData Driven Event-driven applications do not need to query remote databases. Local data access enables them to have higher throughput and lower latency. Taking the anti-fraud case as an example, DataDriven writes the processing rule model into the DatastreamAPI, and then abstracts the entire logic to the Flink engine. When events or data flow in, the corresponding rule model will be triggered. Once the conditions in the rule are triggered, DataDriven will quickly process it and notify the business application. Data Analytics Compared with batch analysis, streaming analysis eliminates the need for periodic data import and query processes, so the latency of obtaining indicators from events is lower. In addition, batch queries must deal with artificial data boundaries caused by periodic imports and input bounds, while streaming queries do not need to consider this problem. Flink provides good support for both continuous streaming analysis and batch analysis, and processes and analyzes data in real time. It is widely used in scenarios such as real-time large screens and real-time reports. Data Pipeline Compared with periodic ETL tasks, continuous data pipelines can significantly reduce the latency of moving data to the destination. For example, based on the upstream StreamETL, real-time data cleaning or expansion can be performed, and a real-time data warehouse can be built downstream to ensure the timeliness of data queries and form a high-efficiency data query link. This scenario is very common in media stream recommendations or search engines. 2. Environment Deployment2.1. Installation package management
2.2 Cluster ConfigurationManagement Node
Distributed Nodes
The two configurations are synchronized to all cluster nodes. 2.3. Start and stop
Startup log:
2.4 Web Interface Visit: 3. Development Entry Case3.1 Data ScriptDistribute a data script to each node:
3.2. Introducing basic dependenciesHere is a basic case written in Java. <dependencies> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.7.0</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.7.0</version> </dependency> </dependencies> 3.3. Read file dataHere, the data in the file is read directly, and the number of times each word appears is analyzed through the program flow. public class WordCount { public static void main(String[] args) throws Exception { // Read file data readFile (); } public static void readFile () throws Exception { // 1. Create execution environment ExecutionEnvironment environment = ExecutionEnvironment.getExecutionEnvironment(); // 2. Read data file String filePath = "/var/flink/test/word.txt"; DataSet<String> inputFile = environment.readTextFile(filePath); // 3. Group and sum DataSet<Tuple2<String, Integer>> wordDataSet = inputFile.flatMap(new WordFlatMapFunction( )).groupBy(0).sum(1); // 4. Print processing results wordDataSet.print(); } //Data reading and cutting method static class WordFlatMapFunction implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override public void flatMap(String input, Collector<Tuple2<String, Integer>> collector){ String[] wordArr = input.split(","); for (String word : wordArr) { collector.collect(new Tuple2<>(word, 1)); } } } } 3.4. Read port dataCreate a port on the hop01 service and simulate sending some data to the port:
Use the Flink program to read and analyze the data content of the port: public class WordCount { public static void main(String[] args) throws Exception { // Read port data readPort (); } public static void readPort () throws Exception { // 1. Create execution environment StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment(); // 2. Read the Socket data port DataStreamSource<String> inputStream = environment.socketTextStream("hop01", 5566); // 3. Data reading and cutting method SingleOutputStreamOperator<Tuple2<String, Integer>> resultDataStream = inputStream.flatMap( new FlatMapFunction<String, Tuple2<String, Integer>>() { @Override public void flatMap(String input, Collector<Tuple2<String, Integer>> collector) { String[] wordArr = input.split(","); for (String word : wordArr) { collector.collect(new Tuple2<>(word, 1)); } } }).keyBy(0).sum(1); // 4. Print analysis results resultDataStream.print(); // 5. Environment startup environment.execute(); } } IV. Operation Mechanism4.1、FlinkClientThe client is used to prepare and send data streams to the JobManager node. Then, according to specific needs, the client can directly disconnect or maintain the connection status and wait for the task processing results. 4.2 JobManagerIn a Flink cluster, a JobManger node and at least one TaskManager node are started. After the JobManager receives the task submitted by the client, it coordinates and sends the task to a specific TaskManager node for execution. The TaskManager node sends heartbeat and processing information to the JobManager. 4.3 TaskManagerA slot is the smallest resource scheduling unit in TaskManager. The number of slots is set at startup. Each slot can start a task, receive tasks deployed by the JobManager node, and perform specific analysis and processing. 5. Source code addressGitHub Address https://github.com/cicadasmile/big-data-parent GitEE Address https://gitee.com/cicadasmile/big-data-parent The above is a brief discussion of the details of the construction and operation mechanism of the real-time computing framework Flink cluster. For more information about the construction and operation mechanism of the real-time computing framework Flink cluster, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: The visual design path of the website should conform to user habits
>>: Solution to the problem of MySQL deleting and inserting data very slowly
Preface In this article, we will use Docker to bu...
Suddenly, I needed to build a private service for...
Install FFmpeg flac eric@ray:~$ sudo apt install ...
As Web developers, although we are not profession...
In a web page, the <input type="file"...
1. Business Background Using a mask layer to shie...
MySQL 5.7 adds many new features, such as: Online...
Table of contents Problem Description 1. Basic so...
Table of contents Preface What is index pushdown?...
Now that we have finished the transform course, l...
Table of contents 1. Usage of DATETIME and TIMEST...
When using TensorFlow for deep learning, insuffic...
Table of contents Introduction The following is a...
Today, I logged into the server and prepared to m...
This article example shares the specific code of ...