Analyzing Linux high-performance network IO and Reactor model

Analyzing Linux high-performance network IO and Reactor model

1. Introduction to basic concepts

  • Process (thread) switching: All systems have the ability to schedule processes, which can suspend a currently running process and resume the previously suspended process
  • Process (thread) blocking: A running process sometimes waits for other events to complete, such as waiting for a lock or requesting I/O reads and writes. The system will automatically block the process while it is waiting, and the process does not occupy the CPU.
  • File descriptor: In Linux, a file descriptor is an abstract concept used to describe a reference to a file, which is a non-negative integer. When a program opens an existing file or creates a new file, the kernel returns a file descriptor to the process.
  • Linux signal processing: Linux processes can receive signal values ​​from the system or process during operation, and then run the corresponding capture function according to the signal value; the signal is equivalent to the software simulation of the hardware interrupt

In the chapter on zero copy mechanism, we have introduced user space, kernel space and buffer, so we will omit them here.

2. Network IO reading and writing process

  • When a read operation on a socket is initiated in user space, a context switch will occur. The user process will block (R1) and wait for the network data stream to arrive, and copy it from the network card to the kernel; (R2) and then copy it from the kernel buffer to the user process buffer. At this time, the process switches and resumes, processing the obtained data
  • Here we give the first stage of the socket read operation an alias R1, and the second stage is called R2
  • When a send operation is initiated on a socket in user space, a context switch occurs and the user process blocks and waits (1) for data to be copied from the user process buffer to the kernel buffer. Data copy is completed, and the process switch is restored

3. Five Linux network IO models

3.1 Blocking I/O

ssize_t recvfrom(int sockfd,void *buf,size_t len,unsigned int flags, struct sockaddr *from,socket_t *fromlen); 

  • The most basic I/O model is the blocking I/O model, which is also the simplest model. All operations are performed sequentially
  • In the blocking IO model, the user space application executes a system call (recvform), which causes the application to be blocked until the data in the kernel buffer is ready and the data is copied from the kernel to the user process. The last process is awakened by the system to process the data
  • In two consecutive stages, R1 and R2, the entire process is blocked.

3.2 Nonblocking IO

  • Non-blocking IO is also a kind of synchronous IO. It is implemented based on a polling mechanism, in which the socket is opened in a non-blocking manner. This means that the I/O operation will not be completed immediately, but the I/O operation will return an error code (EWOULDBLOCK) indicating that the operation was not completed.
  • Poll to check kernel data, and return EWOULDBLOCK if data is not ready. The process continues to initiate the recvfrom call, of course you can pause to do other things
  • Until the kernel data is ready, the data is copied to the user space, and then the process obtains the non-error code data and proceeds to process the data. It should be noted that during the entire process of copying data, the process is still in a blocked state.
  • The process is blocked in the R2 phase. Although it is not blocked in the R1 phase, it needs to be polled continuously.

3.3. Multiplexing I/O (IO multiplexing)

  • Generally, there will be a large number of socket connections in the backend service. If you can query the read and write status of multiple sockets at a time, if any one is ready, then process it, the efficiency will be much higher. This is "I/O multiplexing". Multiplexing refers to multiple sockets, and multiplexing refers to multiplexing the same process.
  • Linux provides multiplexed I/O implementations such as select, poll, and epoll
  • select or poll, epoll are blocking calls
  • Unlike blocking IO, select will not wait until all socket data arrives before processing, but will resume the user process to process once some socket data is ready. How do you know that some data is ready in the kernel? Answer: Let the system handle it.
  • The process is also blocked in the R1 and R2 stages; however, there is a trick in the R1 stage. In a multi-process and multi-threaded programming environment, we can assign only one process (thread) to block the call to select, so that other threads can be freed.

3.4 Signal-driven I/O (SIGIO)

  • A signal capture function needs to be provided and associated with the socket; after initiating the sigaction call, the process can be freed to handle other things
  • When the data is ready in the kernel, the process will receive a SIGIO signal, then interrupt to run the signal capture function, call recvfrom to read the data from the kernel to user space, and then process the data
  • It can be seen that the user process will not be blocked in the R1 stage, but R2 will still be blocked waiting

3.5. Asynchronous IO (POSIX aio_ series functions)

  • Compared with synchronous IO, asynchronous IO will not block the current process after the user process initiates an asynchronous read (aio_read) system call, regardless of whether the kernel buffer data is ready; the process can process other logic after the aio_read system call returns.
  • When the socket data is ready in the kernel, the system directly copies the data from the kernel to the user space, and then uses a signal to notify the user process
  • The processes in both R1 and R2 stages are non-blocking

4. A Deep Understanding of Multiplexed IO

4.1、select

int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

1) Use copy_from_user to copy fd_set from user space to kernel space

2) Register callback function __pollwait

3) Traverse all fds and call their corresponding poll methods (for sockets, this poll method is sock_poll, sock_poll will call tcp_poll, udp_poll or datagram_poll depending on the situation)

4) Taking tcp_poll as an example, its core implementation is __pollwait, which is the callback function registered above

5) The main task of __pollwait is to hang current (the current process) in the waiting queue of the device. Different devices have different waiting queues. For tcp_poll, its waiting queue is sk->sk_sleep (note that hanging a process in the waiting queue does not mean that the process has gone to sleep). After the device receives a message (network device) or fills in the file data (disk device), it wakes up the sleeping process on the device waiting queue, and the current is awakened.

6) When the poll method returns, it returns a mask describing whether the read and write operations are ready. According to this mask, fd_set is assigned a value.

7) If all fds are traversed and a readable and writable mask has not been returned, schedule_timeout will be called to put the process that called select (that is, current) into sleep

8) When the device driver's own resources become readable and writable, it wakes up the sleeping process in its waiting queue. If no one wakes up after a certain timeout (specified by timeout), the process calling select will be woken up again to obtain the CPU, and then re-traverse the fd to determine whether there is a ready fd

9) Copy fd_set from kernel space to user space

Disadvantages of select:

  • Each time select is called, the fd set needs to be copied from user state to kernel state. This overhead will be very large when there are many fds.
  • At the same time, each call to select requires the kernel to traverse all the fds passed in, which is very expensive when there are many fds.
  • The number of file descriptors supported by select is too small, the default is 1024

4.2, epoll

int epoll_create(int size);  
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);  
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
  • When epoll_create is called, a red-black tree will be built in the kernel cache to store the sockets sent by epoll_ctl in the future. At the same time, a rdllist bidirectional linked list will be created to store ready events. When epoll_wait is called, just check the data of the rdllist bidirectional linked list
  • When epoll_ctl adds, modifies, or deletes events to an epoll object, it operates in the rbr red-black tree, which is very fast.
  • Events added to epoll will establish a callback relationship with the device (such as a network card). When the corresponding event occurs on the device, the callback method will be called and the event will be added to the rdllist bidirectional linked list; this callback method is called ep_poll_callback in the kernel.

Two trigger modes of epoll:

epoll has two trigger modes: EPOLLLT and EPOLLET. LT is the default mode and ET is the "high-speed" mode (only supports no-block sockets).

  • In LT (level trigger) mode, as long as there is data to read from this file descriptor, each epoll_wait will trigger its read event
  • In ET (edge-triggered) mode, when an I/O event is detected, the file descriptor with event notification will be obtained through the epoll_wait call. For the file descriptor, if it is readable, the file descriptor must be read until it is empty (or EWOULDBLOCK is returned), otherwise the next epoll_wait will not trigger the event.

4.3. Advantages of epoll over select

Solve three shortcomings of select:

  • For the first shortcoming: epoll's solution is in the epoll_ctl function. Each time a new event is registered to the epoll handle (specify EPOLL_CTL_ADD in epoll_ctl), all fds are copied into the kernel instead of being copied repeatedly during epoll_wait. epoll ensures that each fd is only copied once during the entire process (epoll_wait does not require copying)
  • For the second disadvantage: epoll specifies a callback function for each fd. When the device is ready and wakes up the waiter on the waiting queue, the callback function will be called, and the callback function will add the ready fd to a ready list. The work of epoll_wait is actually to check if there is a ready fd in this ready list (no need to traverse)
  • As for the third disadvantage: epoll does not have this limitation. The upper limit of the FD it supports is the maximum number of open files. This number is generally much larger than 2048. For example, on a machine with 1GB of memory, it is about 100,000. Generally speaking, this number is closely related to the system memory.

High performance of epoll:

  • epoll uses a red-black tree to store the file descriptor events that need to be monitored, and epoll_ctl adds, deletes, and modifies files quickly.
  • epoll can get the ready fd without traversal, and directly return to the ready list
  • After Linux 2.6, mmap technology is used, and data no longer needs to be copied from the kernel to the user space, zero copy

4.4. Questions about whether the epoll IO model is synchronous or asynchronous

Concept Definition:

  • Synchronous I/O operations: cause the requesting process to block until the I/O operation is completed
  • Asynchronous I/O operation: does not cause the request process to be blocked. Asynchronous operation only handles the notification after the I/O operation is completed, and does not actively read or write data. The system kernel completes the reading and writing of data.
  • Blocking, non-blocking: whether the data to be accessed by the process/thread is ready, whether the process/thread needs to wait

The concept of asynchronous IO requires non-blocking I/O calls. As mentioned earlier, I/O operations are divided into two stages: R1 waits for data to be ready. R2 copies data from the kernel to the process. Although epoll uses the mmap mechanism after the 2.6 kernel, which eliminates the need for replication in the R2 stage, it is still blocked in R1. Therefore, it is classified as synchronous IO

5. Reactor Model

The core idea of ​​Reactor is to register all I/O events to be processed on a central I/O multiplexer, and the main thread/process is blocked on the multiplexer; once an I/O event arrives or is ready, the multiplexer returns and distributes the corresponding I/O events registered in advance to the corresponding processors.

5.1. Introduction to related concepts

  • Event: is a state; for example, the read-ready event refers to the state in which we can read data from the kernel.
  • Event separator: Generally, the waiting for events will be handed over to epoll and select; the arrival of events is random and asynchronous, so epoll needs to be called cyclically. The corresponding encapsulated module in the framework is the event separator (simply understood as the encapsulation of epoll)
  • Event handler: After an event occurs, a process or thread is needed to handle it. This handler is the event handler, which is usually a different thread from the event separator.

5.2. General Process of Reactor

1) The application registers the read-write ready event and the read-write ready event handler in the event separator

2) The event separator waits for the read-write ready event to occur

3) The read-write ready event occurs, activating the event separator, which calls the read-write ready event handler

4) The event handler first reads the data from the kernel to user space, and then processes the data

5.3. Single Thread + Reactor

5.4 Multithreading + Reactor

5.5. Multithreading + Multiple Reactors

6. General process of Proactor model

1) The application registers the read completion event and read completion event handler in the event separator and sends an asynchronous read request to the system

2) The event separator waits for the completion of the read event

3) While the separator is waiting, the system uses parallel kernel threads to perform the actual read operation, copies the data to the process buffer, and finally notifies the event separator that the read is complete.

4) The event separator listens to the read completion event and activates the handler of the read completion event

5) The read completion event handler directly processes the data in the user process buffer

6.1. Differences between Proactor and Reactor

  • Proactor is based on the concept of asynchronous I/O, while Reactor is generally based on the concept of multiplexed I/O.
  • Proactor does not need to copy data from the kernel to user space, this step is done by the system

The above is the detailed content of analyzing Linux high-performance network IO and Reactor model. For more information about Linux high-performance network IO and Reactor model, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • A detailed introduction to Linux IO
  • Tutorial on using iostat command in Linux
  • Linux IO multiplexing epoll network programming
  • Interesting explanation of Linux's Socket IO model
  • A detailed introduction to the five IO models under Linux

<<:  How to mark the source and origin of CSS3 citations

>>:  How to add website icon?

Recommend

MySQL SQL statement performance tuning simple example

MySQL SQL statement performance tuning simple exa...

Detailed explanation of mysql filtering replication ideas

Table of contents mysql filtered replication Impl...

A complete guide to clearing floats in CSS (summary)

1. Parent div defines pseudo-classes: after and z...

Summary of bootstrap learning experience-css style design sharing

Due to the needs of the project, I plan to study ...

Detailed explanation of the steps to create a web server with node.js

Preface It is very simple to create a server in n...

jQuery simulates picker to achieve sliding selection effect

This article shares the specific code of jQuery t...

Dissecting the advantages of class over id when annotating HTML elements

There are very complex HTML structures in web pag...

Python writes output to csv operation

As shown below: def test_write(self): fields=[] f...

Detailed explanation of how MySQL solves phantom reads

1. What is phantom reading? In a transaction, aft...

MySQL query method with multiple conditions

mysql query with multiple conditions Environment:...

Swiper.js plugin makes it super easy to implement carousel images

Swiper is a sliding special effects plug-in built...

HTML dynamically loads css styles and js scripts example

1. Dynamically loading scripts As the demand for ...

Solution to the problem that the mysql8.0.11 client cannot log in

This article shares with you the solution to the ...

Detailed explanation of React event binding

1. What is In react applications, event names are...

Set the width of the table to be fixed so that it does not change with the text

After setting the table width in the page to width...