A brief discussion on the construction and operation mechanism of the real-time computing framework Flink cluster

A brief discussion on the construction and operation mechanism of the real-time computing framework Flink cluster

1. Flink Overview

1.1 Basic Introduction

The main features include: batch and stream integration, precise state management, event time support, and exactly-once state consistency guarantee. Flink can not only run on a variety of resource management frameworks including YARN, Mesos, and Kubernetes, but also supports independent deployment on bare metal clusters. With the high availability option enabled, there is no single point of failure.

There are two concepts to explain here:

  • Boundary: Unbounded and bounded data flows, which can be understood as data aggregation strategies or conditions;
  • Status: whether there is a dependency in the execution order, that is, whether the next execution depends on the previous result;

1.2 Application Scenarios

Data Driven

Event-driven applications do not need to query remote databases. Local data access enables them to have higher throughput and lower latency. Taking the anti-fraud case as an example, DataDriven writes the processing rule model into the DatastreamAPI, and then abstracts the entire logic to the Flink engine. When events or data flow in, the corresponding rule model will be triggered. Once the conditions in the rule are triggered, DataDriven will quickly process it and notify the business application.

Data Analytics

Compared with batch analysis, streaming analysis eliminates the need for periodic data import and query processes, so the latency of obtaining indicators from events is lower. In addition, batch queries must deal with artificial data boundaries caused by periodic imports and input bounds, while streaming queries do not need to consider this problem. Flink provides good support for both continuous streaming analysis and batch analysis, and processes and analyzes data in real time. It is widely used in scenarios such as real-time large screens and real-time reports.

Data Pipeline

Compared with periodic ETL tasks, continuous data pipelines can significantly reduce the latency of moving data to the destination. For example, based on the upstream StreamETL, real-time data cleaning or expansion can be performed, and a real-time data warehouse can be built downstream to ensure the timeliness of data queries and form a high-efficiency data query link. This scenario is very common in media stream recommendations or search engines.

2. Environment Deployment

2.1. Installation package management

[root@hop01 opt]# tar -zxvf flink-1.7.0-bin-hadoop27-scala_2.11.tgz

[root@hop02 opt]# mv flink-1.7.0 flink1.7

2.2 Cluster Configuration

Management Node

[root@hop01 opt]# cd /opt/flink1.7/conf

[root@hop01 conf]# vim flink-conf.yaml

jobmanager.rpc.address: hop01

Distributed Nodes

[root@hop01 conf]# vim slaves

hop02

hop03

The two configurations are synchronized to all cluster nodes.

2.3. Start and stop

/opt/flink1.7/bin/start-cluster.sh

/opt/flink1.7/bin/stop-cluster.sh

Startup log:

[root@hop01 conf]# /opt/flink1.7/bin/start-cluster.sh

Starting cluster.

Starting standalonesession daemon on host hop01.

Starting taskexecutor daemon on host hop02.

Starting taskexecutor daemon on host hop03.

2.4 Web Interface

Visit: http://hop01:8081/

3. Development Entry Case

3.1 Data Script

Distribute a data script to each node:

/var/flink/test/word.txt

3.2. Introducing basic dependencies

Here is a basic case written in Java.

<dependencies>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-java</artifactId>
        <version>1.7.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-java_2.11</artifactId>
        <version>1.7.0</version>
    </dependency>
</dependencies>

3.3. Read file data

Here, the data in the file is read directly, and the number of times each word appears is analyzed through the program flow.

public class WordCount {
    public static void main(String[] args) throws Exception {
        // Read file data readFile ();
    }

    public static void readFile () throws Exception {
        // 1. Create execution environment ExecutionEnvironment environment = ExecutionEnvironment.getExecutionEnvironment();

        // 2. Read data file String filePath = "/var/flink/test/word.txt";
        DataSet<String> inputFile = environment.readTextFile(filePath);

        // 3. Group and sum DataSet<Tuple2<String, Integer>> wordDataSet = inputFile.flatMap(new WordFlatMapFunction(
        )).groupBy(0).sum(1);

        // 4. Print processing results wordDataSet.print();
    }

    //Data reading and cutting method static class WordFlatMapFunction implements FlatMapFunction<String, Tuple2<String, Integer>> {
        @Override
        public void flatMap(String input, Collector<Tuple2<String, Integer>> collector){
            String[] wordArr = input.split(",");
            for (String word : wordArr) {
                collector.collect(new Tuple2<>(word, 1));
            }
        }
    }
} 

3.4. Read port data

Create a port on the hop01 service and simulate sending some data to the port:

[root@hop01 ~]# nc -lk 5566

c++,java

Use the Flink program to read and analyze the data content of the port:

public class WordCount {
    public static void main(String[] args) throws Exception {
        // Read port data readPort ();
    }

    public static void readPort () throws Exception {
        // 1. Create execution environment StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();

        // 2. Read the Socket data port DataStreamSource<String> inputStream = environment.socketTextStream("hop01", 5566);

        // 3. Data reading and cutting method SingleOutputStreamOperator<Tuple2<String, Integer>> resultDataStream = inputStream.flatMap(
                new FlatMapFunction<String, Tuple2<String, Integer>>()
        {
            @Override
            public void flatMap(String input, Collector<Tuple2<String, Integer>> collector) {
                String[] wordArr = input.split(",");
                for (String word : wordArr) {
                    collector.collect(new Tuple2<>(word, 1));
                }
            }
        }).keyBy(0).sum(1);

        // 4. Print analysis results resultDataStream.print();

        // 5. Environment startup environment.execute();
    }
}

IV. Operation Mechanism

4.1、FlinkClient

The client is used to prepare and send data streams to the JobManager node. Then, according to specific needs, the client can directly disconnect or maintain the connection status and wait for the task processing results.

4.2 JobManager

In a Flink cluster, a JobManger node and at least one TaskManager node are started. After the JobManager receives the task submitted by the client, it coordinates and sends the task to a specific TaskManager node for execution. The TaskManager node sends heartbeat and processing information to the JobManager.

4.3 TaskManager

A slot is the smallest resource scheduling unit in TaskManager. The number of slots is set at startup. Each slot can start a task, receive tasks deployed by the JobManager node, and perform specific analysis and processing.

5. Source code address

GitHub Address

https://github.com/cicadasmile/big-data-parent

GitEE Address

https://gitee.com/cicadasmile/big-data-parent

The above is a brief discussion of the details of the construction and operation mechanism of the real-time computing framework Flink cluster. For more information about the construction and operation mechanism of the real-time computing framework Flink cluster, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Detailed explanation of memory management of Flink, a big data processing engine
  • Detailed steps for implementing timeout status monitoring in Apache FlinkCEP
  • What data types does Flink support?
  • Java lambda expression to implement Flink WordCount process analysis
  • Big Data HelloWorld-Flink implements WordCount
  • Analyze Flink's core principles and implement core abstractions

<<:  The visual design path of the website should conform to user habits

>>:  Solution to the problem of MySQL deleting and inserting data very slowly

Recommend

What are the core modules of node.js

Table of contents Global Object Global objects an...

Example of deploying Laravel application with Docker

The PHP base image used in this article is: php:7...

Tomcat uses Log4j to output catalina.out log

Tomcat's default log uses java.util.logging, ...

Vue commonly used high-order functions and comprehensive examples

1. Commonly used high-order functions of arrays S...

Nginx routing forwarding and reverse proxy location configuration implementation

Three ways to configure Nginx The first method di...

How to deal with too many Docker logs causing the disk to fill up

I have a server with multiple docker containers d...

HTML table tag tutorial (45): table body tag

The <tbody> tag is used to define the style...

Understand the principles of MySQL persistence and rollback in one article

Table of contents redo log Why do we need to upda...

WeChat applet to save albums and pictures to albums

I am currently developing a video and tool app, s...

How to deploy HTTPS for free on Tencent Cloud

Recently, when I was writing a WeChat applet, the...

MySql5.7.21 installation points record notes

The downloaded version is the Zip decompression v...

Example of using CSS3 to achieve shiny font effect when unlocking an Apple phone

0. Introduction August 18, 2016 Today, I noticed ...

Vue implements sending emoticons in chat box

The specific code for sending emoticons in the vu...