This article discusses several major zero-copy technologies in Linux and the scenarios in which zero-copy technologies are applicable. In order to quickly establish the concept of zero copy, let's introduce a common scenario: Citations When writing a server program (Web Server or file server), file downloading is a basic function. At this time, the server's task is to send the file on the server host disk from the connected socket without modification. We usually use the following code to complete it: while((n = read(diskfd, buf, BUF_SIZE)) > 0) write(sockfd, buf , n); The basic operation is to cyclically read the file content from the disk into the buffer, and then send the content of the buffer to the socket. However, Linux's I/O operations are buffered I/O by default. The two system calls mainly used here are read and write, and we don’t know what the operating system does in it. In fact, multiple data copies occur in the above I/O operations. When an application accesses a piece of data, the operating system first checks whether the file has been accessed recently and whether the file content is cached in the kernel buffer. If so, the operating system directly copies the contents of the kernel buffer to the user space buffer specified by buf based on the buf address provided by the read system call. If not, the operating system first copies the data on the disk to the kernel buffer, which currently relies mainly on DMA for transmission, and then copies the contents of the kernel buffer to the user buffer. Next, the write system call copies the contents of the user buffer to the kernel buffer associated with the network stack, and finally the socket sends the contents of the kernel buffer to the network card. Having said so much, it is better to look at the picture clearly: Data copy As can be seen from the above figure, a total of four data copies are generated. Even if DMA is used to handle communication with the hardware, the CPU still needs to process two data copies. At the same time, multiple context switches occur between user mode and kernel mode, which undoubtedly increases the CPU burden. What is zero-copy technology? ## The main task of zero copy is to prevent the CPU from copying data from one storage unit to another. It mainly uses various zero copy technologies to avoid the CPU from doing a large number of data copy tasks, reduce unnecessary copying, or let other components do this type of simple data transmission tasks, freeing the CPU to focus on other tasks. This allows for more efficient use of system resources. Let’s go back to the example in the previous article. How can we reduce the number of times data is copied? An obvious focus is to reduce the copying of data back and forth between kernel space and user space, which also introduces a type of zero copy: Allow data transmission without passing through user space Using mmap##### One way we can reduce the number of copies is to call mmap() instead of read: buf = mmap(diskfd, len); write(sockfd, buf, len); When the application calls mmap(), the data on the disk will be copied to the kernel buffer via DMA. The operating system will then share this kernel buffer with the application, so there is no need to copy the contents of the kernel buffer to user space. The application calls write() again, and the operating system directly copies the contents of the kernel buffer to the socket buffer. All this happens in kernel state. Finally, the socket buffer sends the data to the network card. mmap Using mmap instead of read obviously reduces one copy, which undoubtedly improves efficiency when the amount of copied data is large. But using mmap comes at a cost. When you use mmap, you may encounter some hidden pitfalls. For example, when your program maps a file, but when the file is truncated by another process, the write system call will be terminated by the SIGBUS signal because of accessing an illegal address. By default, the SIGBUS signal will kill your process and generate a coredump. If your server is terminated in this way, it will cause a loss. We usually use the following solutions to avoid this problem: Creating a signal handler for the SIGBUS signal When a SIGBUS signal is encountered, the signal handler simply returns, the write system call returns the number of bytes written before being interrupted, and errno is set to success, but this is a bad way to deal with it because you have not solved the core of the problem. Using file lease locks We usually use this method to use lease locks on file descriptors. We apply for a lease lock from the kernel for the file. When other processes want to truncate the file, the kernel will send us a real-time RT_SIGNAL_LEASE signal, telling us that the kernel is destroying the read-write lock you have on the file. This way your write system call will be interrupted before the program accesses illegal memory and is killed by SIGBUS. write will return the number of bytes written and set errno to success. if(fcntl(diskfd, F_SETSIG, RT_SIGNAL_LEASE) == -1) { perror("kernel lease set signal"); return -1; } /* l_type can be F_RDLCK F_WRLCK lock*/ /* l_type can be F_UNLCK unlock*/ if(fcntl(diskfd, F_SETLEASE, l_type)){ perror("kernel lease set type"); return -1; } Using sendfile##### Starting from kernel version 2.1, Linux introduced sendfile to simplify operations: #include <sys/sendfile.h> ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count); The sendfile() system call transfers file contents (bytes) between the input file descriptor in_fd and the output file descriptor out_fd. The descriptor out_fd must refer to a socket, and the file pointed to by in_fd must be mmap-able. These limitations restrict the use of sendfile to the point where it can only transfer data from a file to a socket, but not vice versa. sendfile system call process What happens if another process truncates the file when we call sendfile? Assuming we don't set any signal handlers, the sendfile call simply returns the number of bytes it had transferred before it was interrupted, and errno will be set to success. If we lock the file before calling sendfile, sendfile will behave the same as before and we will receive the RT_SIGNAL_LEASE signal. So far, we have reduced the number of data copies, but there is still one copy, which is the copy from the page cache to the socket cache. So can we omit this copy as well? With the help of hardware, we can do it. Previously, we copied the data in the page cache to the socket cache. In fact, we only need to pass the buffer descriptor to the socket buffer and then pass the data length. In this way, the DMA controller can directly package the data in the page cache and send it to the network. To summarize, the sendfile system call uses the DMA engine to copy the file content to the kernel buffer, and then adds the buffer descriptor with the file location and length information to the socket buffer. This step will not copy the data in the kernel to the socket buffer. The DMA engine will copy the data in the kernel buffer to the protocol engine, avoiding the last copy. sendfile with DMA However, this collection and copy function requires hardware and driver support. Using splice##### sendfile is only applicable to copying data from a file to a socket, which limits its scope of use. Linux introduced the splice system call in version 2.6.17 to move data between two file descriptors: #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <fcntl.h> ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags); The splice call moves data between two file descriptors without copying the data back and forth between kernel space and user space. It copies len length data from fd_in to fd_out, but one of the two must be a pipe device, which is also some of the limitations of splice at present. The flags parameter has the following values:
The splice call utilizes the pipe buffer mechanism proposed by Linux, so at least one descriptor must be a pipe. The above zero-copy technologies are all implemented by reducing the copying of data between user space and kernel space. However, sometimes data must be copied between user space and kernel space. At this time, we can only focus on the timing of copying data between user space and kernel space. Linux usually uses copy on write to reduce system overhead, and this technology is often called COW. Due to space constraints, this article does not introduce copy-on-write in detail. To put it roughly: if multiple programs access the same piece of data at the same time, each program has a pointer to this data. From the perspective of each program, it independently owns this data. Only when the program needs to modify the data content will the data content be copied to the program's own application space. At this time, the data becomes the private data of the program. If the program does not need to modify the data, then it never needs to copy the data to its own application space. This reduces data copying. The content of copy-on-write could fill another article. . . In addition, there are some zero-copy technologies. For example, adding the O_DIRECT mark to the traditional Linux I/O can directly perform I/O and avoid automatic caching. There is also the immature fbufs technology. This article does not cover all zero-copy technologies, but only introduces some common ones. If you are interested, you can study it yourself. Generally, mature server projects will also modify the I/O-related parts of the kernel by themselves to improve their data transmission rate. This concludes this article on the use of zero-copy technology in Linux. For more relevant Linux zero-copy content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: JS achieves five-star praise effect
>>: MySQL 8.0.21 installation and configuration method graphic tutorial
I have installed various images under virtual mac...
Overview As for the current default network of Do...
Statistics of QPS values in the last N seconds ...
You can go to the Ubuntu official website to down...
The operating environment of this tutorial: Windo...
Specific method: (Recommended tutorial: MySQL dat...
1. CSS style solves the problem of displaying ell...
The default firewall of CentOS7 is not iptables, ...
Many friends will report the following error when...
1. Docker installation and startup yum install ep...
In many cases, you need to process the background...
Overview Let's summarize some SQL statements ...
I recently deployed Django and didn't want to...
Table of contents environment: 1. Docker enables ...
Method 1: <input id= "File1" type= &q...