1. What is copy_{to,from}_user()It is a bridge for communication between kernel space and user space. All data interactions should use an interface like this. But what exactly is his role? We raise the following questions:
Warm reminder: The code analysis in this article is based on Linux-4.18.0, and some architecture-related codes are represented by ARM64. 1. copy_{to,from}_user() vs. memcpy()
From various blogs, the opinions mainly focus on the first point. It seems that the first point is widely recognized. However, those who focus on practice come to a second view, after all, practice makes perfect. Is the truth in the hands of a few people? Or are the people's eyes sharper? Of course, I don’t deny any of the above views. Nor can we guarantee you which view is correct. Because, I believe that even a theory that was once impeccable may no longer be correct as time goes by or as specific circumstances change. For example, Newton’s theory of classical mechanics (this seems a bit far-fetched). If I were to put it in human terms, it would be this: the Linux codebase is constantly changing over time. Perhaps the above view was once correct. Of course, it may still be correct now. The following analysis is my opinion. Likewise, we need to remain skeptical. 2. Function definitionFirst, let’s look at the function definitions of memcpy() and copy_{to,from}_user(). The parameters are almost the same, they all contain the destination address, source address and the size of bytes to be copied. static __always_inline unsigned long __must_check copy_to_user(void __user *to, const void *from, unsigned long n); static __always_inline unsigned long __must_check copy_from_user(void *to, const void __user *from, unsigned long n); void *memcpy(void *dest, const void *src, size_t len); However, there is one thing we know for sure. That is, memcpy() does not check the legitimacy of the address passed in. And copy_{to,from}_user() performs a validity check similar to the following on the incoming address (to put it simply, for more verification details, please refer to the code).
After this brief comparison, let’s look at other differences and discuss the two points mentioned above. Let’s start with the second point. When it comes to practice, I still believe that practice makes perfect. From the results of my test, the implementation results are divided into two situations. The result of the first case is: using memcpy() to test, there is no problem and the code runs normally. The test code is as follows (only the read interface function corresponding to file_operations under the proc file system is shown): static ssize_t test_read(struct file *file, char __user *buf, size_t len, loff_t *offset) { memcpy(buf, "test\n", 5); /* copy_to_user(buf, "test\n", 5) */ return 5; } We use the cat command to read the file contents. cat will call test_read through the read system call, and the buf size passed is 4k. The test went smoothly and the results were promising. The "test" string was read successfully. It seems that the second point is correct. However, we still need to continue to verify and explore. Because the first point mentioned, "this page fault exception must be explicitly repaired in kernel space." Therefore, we also need to verify the following situation: if buf has been allocated virtual address space in user space, but no specific mapping relationship with physical memory has been established, a kernel-mode page fault will occur in this case. We first need to create this condition, find the matching buf, and then test it. Of course I didn't test this. Because there are test conclusions (mainly because I am lazy and I find it troublesome to construct this condition). This test was given by a friend of mine, also known as Teacher Song’s “Assistant Teacher” Ackerman. He once did this experiment and concluded that: even if there is no specific mapping relationship between buf and physical memory, the code can run normally. A page fault occurs in kernel state and is repaired by it (allocating specific physical memory, filling the page table, and establishing a mapping relationship). At the same time, I analyzed it from the perspective of code, and the conclusion was the same. After the above analysis, it seems that memcpy() can also be used normally. Considering safety, it is recommended to use interfaces such as copy_{to,from}_user(). The result of the second case is that the above test code does not run properly and will trigger a kernel oops. Of course, the kernel configuration options for this test are different from those for the last test. This configuration item is CONFIG_ARM64_SW_TTBR0_PAN or CONFIG_ARM64_PAN (for ARM64 platform). The function of both configuration options is to prevent kernel mode from directly accessing user address space. The only difference is that After turning on Why do we need PAN (Privileged Access Never) feature? The reason may be that the data interaction between user space and kernel space can easily introduce security issues, so we do not allow kernel space to easily access user space. If we must do so, we must close PAN through a specific interface. On the other hand, the PAN function can further standardize the use of interfaces for kernel-mode and user-mode data interaction. When the PAN function is enabled, the kernel or driver developers can be forced to use security interfaces such as copy_{to,from}_user() to improve the security of the system. For non-standard operations like memcpy(), the kernel will oops to you. Security vulnerabilities are introduced due to improper programming. For example: Linux kernel vulnerability CVE-2017-5123 can escalate privileges. The reason for the introduction of this vulnerability is the lack of access_ok() to check the legitimacy of the address passed by the user. Therefore, in order to avoid security issues introduced by our own code, we must be extra careful about the interaction between kernel space and user space data. 2. CONFIG_ARM64_SW_TTBR0_PAN principleCONFIG_ARM64_SW_TTBR0_PAN The principle behind the design. Due to the special hardware design of ARM64, we use two page table base address registers ttbr0_el1 and ttbr1_el1. The processor determines whether the accessed address belongs to user space or kernel space based on the high 16 bits of the 64-bit address. If it is a user space address, use ttbr0_el1, otherwise use ttbr1_el1. Therefore, when switching the ARM64 process, you only need to change the value of ttbr0_el1. ttbr1_el1 may choose not to change, since all processes share the same kernel space address. When a process switches to kernel state (interrupt, exception, system call, etc.), how can we prevent kernel state from accessing user state address space? In fact, it is not difficult to figure out that we just need to change the value of ttbr0_el1 to point to an illegal mapping. Therefore, we prepare a special page table for this purpose. The page table size is 4k memory and its values are all 0. When the process switches to kernel mode, modifying the value of ttbr0_el1 to the address of the page table can ensure that access to the user space address is illegal. Because the value of the page table is illegal. This special page table memory is allocated by the linker script. #define RESERVED_TTBR0_SIZE (PAGE_SIZE) SECTIONS { reserved_ttbr0 = .; . += RESERVED_TTBR0_SIZE; swapper_pg_dir = .; . += SWAPPER_DIR_SIZE; swapper_pg_end = .; } This special page table is located together with the kernel page table. The size difference from swapper_pg_dir is only 4k. The contents of the 4k memory space starting at reserved_ttbr0 address will be cleared. When we enter the kernel state, we will switch ttbr0_el1 through __uaccess_ttbr0_disable to disable user space address access, and enable user space address access through _uaccess_ttbr0_enable when access is needed. The two macro definitions are not complicated. Let's take _uaccess_ttbr0_disable as an example to illustrate the principle. Its definition is as follows: macro __uaccess_ttbr0_disable, tmp1 mrs \tmp1, ttbr1_el1 // swapper_pg_dir (1) bic \tmp1, \tmp1, #TTBR_ASID_MASK sub \tmp1, \tmp1, #RESERVED_TTBR0_SIZE // reserved_ttbr0 just before // swapper_pg_dir (2) msr ttbr0_el1, \tmp1 // set reserved TTBR0_EL1 (3) isb add \tmp1, \tmp1, #RESERVED_TTBR0_SIZE msr ttbr1_el1, \tmp1 // set reserved ASID isb .endm
The C language implementation corresponding to __uaccess_ttbr0_disable can be found here. How to allow kernel mode to access user space addresses? It is also very simple, which is the reverse operation of __uaccess_ttbr0_disable, giving ttbr0_el1 a legal page table base address. There is no need to repeat it here. What we need to know now is that when CONFIG_ARM64_SW_TTBR0_PAN is configured, the copy_{to,from}_user() interface will allow kernel mode to access user space before copying, and will disable kernel mode's ability to access user space after copying is completed. Therefore, using copy_{to,from}_user() is the orthodox approach. It is mainly reflected in security checks and security access processing. This is the first feature it has over memcpy(), and another important feature will be introduced later. We can now answer the questions left over from the previous section. How can I continue to use memcpy()? Now it is very simple. Before calling memcpy(), allow kernel mode to access user space address through uaccess_enable_not_uao(), call memcpy(), and finally disable kernel mode's ability to access user space through uaccess_disable_not_uao(). 3. TestingThe above test cases are all based on the test of passing legal addresses in user space. What is a legal user space address? The address range contained in the virtual address space requested by the user space through the system call is a legal address (regardless of whether physical pages are allocated to establish a mapping relationship). Since we are writing an interface program, we must also consider the robustness of the program. We cannot assume that all parameters passed by users are legal. We should predict the occurrence of illegal transmission of participants and prepare in advance, which is to prepare for a rainy day. We first use the test case of memcpy(), passing a random invalid address. After testing, it was found that it would trigger kernel oops. Continue to use copy_{to,from}_user() instead of memcpy() test. The test found that read() only returns an error but does not trigger a kernel oops. This is the result we want. After all, an application should not be able to trigger a kernel oops. What is the implementation principle of this mechanism? Let’s take copy_to_user() as an example. The function call flow is: _arch_copy_to_user() is implemented in assembly code on the ARM64 platform, and this part of the code is critical. end .req x5 ENTRY(__arch_copy_to_user) uaccess_enable_not_uao x3, x4, x5 add end, x0, x2 #include "copy_template.S" uaccess_disable_not_uao x3, x4 mov x0, #0 ret ENDPROC(__arch_copy_to_user) .section .fixup,"ax" .align 2 9998: sub x0, end, dst // bytes not copied ret .previous
Compared with the results of the previous analysis, in fact, _arch_copy_to_user() can be approximately equivalent to the following relationship. uaccess_enable_not_uao(); memcpy(ubuf, kbuf, size); == __arch_copy_to_user(ubuf, kbuf, size); uaccess_disable_not_uao(); Let me first insert a message to explain why copy_template.S is memcpy(). memcpy() is implemented by assembly code on the ARM64 platform. It is defined in the arch/arm64/lib/memcpy.S file. .weak memcpy ENTRY(__memcpy) ENTRY(memcpy) #include "copy_template.S" ret ENDPIPROC(memcpy) ENDPROC(__memcpy) So obviously, the memcpy() and __memcpy() function definitions are the same. And the memcpy() function is declared as weak, so the memcpy() function can be rewritten (a bit far-fetched). Let me go a little further. Why use assembly? Why not use the memcpy() function in the lib/string.c file? Of course, this is to optimize the execution speed of memcpy(). The memcpy() function in the lib/string.c file copies bytes (even the best hardware can be ruined by rough code). However, most processors nowadays are 32 or 64 bits, so it is possible to copy 4 bytes, 8 bytes or even 16 bytes (considering address alignment). Can significantly improve execution speed. Therefore, the ARM64 platform uses assembly implementation. For this part of knowledge, please refer to this blog "memcpy optimization and implementation of ARM64". Let's get back to the point and repeat: when kernel state accesses a user space address and a page fault is triggered, as long as the user space address is legal, kernel state will repair the exception as if nothing happened (allocate physical memory and establish a page table mapping relationship). But if you access an illegal user space address, choose path 2 and try to redeem yourself. This way is to use the static void __do_kernel_fault(unsigned long addr, unsigned int esr, struct pt_regs *regs) { /* * Are we prepared to handle this kernel fault? * We are almost certainly not prepared to handle instruction faults. */ if (!is_el1_instruction_abort(esr) && fixup_exception(regs)) return; /* ... */ } fixup_exception() goes on to call search_exception_tables(), which looks for the _extable section. The __extable segment stores the exception table, and each entry stores the exception address and its corresponding repair address. For example, the address of the above-mentioned 9998:subx0,end,dst instruction will be found and the return address of the do_page_fault() function will be modified to achieve the jump repair function. In fact, the search process is to find out whether there is a corresponding exception table entry in the _extable segment (exception table) based on the address addr of the problem. If there is, it means that it can be repaired. Since the implementation methods of 32-bit processors and 64-bit processors are different, we will first start with the implementation principle of the 32-bit processor exception table. The first and last addresses of the _extable segment are __start___ex_table and __stop___ex_table (defined in include/asm-generic/vmlinux.lds.h). This memory segment can be regarded as an array, each element of which is of type struct exception_table_entry, which records the address where the exception occurred and its corresponding repair address. exception tables __start___ex_table --> +---------------+ | entry | +---------------+ | entry | +---------------+ | ... | +---------------+ | entry | +---------------+ | entry | __stop___ex_table --> +---------------+ On a 32-bit processor, struct exception_table_entry is defined as follows: struct exception_table_entry { unsigned long insn, fixup; }; One thing needs to be made clear, on a 32-bit processor, unsigned long is 4 bytes. insn and fixup store the exception occurrence address and its corresponding fixup address respectively. Search for the corresponding repair address according to the exception address ex_addr (return 0 if not found). The schematic code is as follows: unsigned long search_fixup_addr32(unsigned long ex_addr) { const struct exception_table_entry *e; for (e = __start___ex_table; e < __stop___ex_table; e++) if (ex_addr == e->insn) return e->fixup; return 0; }
When 64-bit processors begin to develop, if we continue to use this method, we will inevitably need twice as much memory as 32-bit processors to store the exception table (because it takes 8 bytes to store an address). Therefore, the kernel uses another method to implement it. On 64 processors, struct exception_table_entry { int insn, fixup; }; The memory occupied by each exception table entry is the same as that of a 32-bit processor, so the memory usage remains unchanged. But the meaning of insn and fixup has changed. insn and fixup respectively store the address where the exception occurred and the offset of the repair address relative to the current structure member address (a bit confusing). For example, according to the exception address ex_addr, the corresponding repair address is searched (0 is returned if not found), and the schematic code is as follows: unsigned long search_fixup_addr64(unsigned long ex_addr) { const struct exception_table_entry *e; for (e = __start___ex_table; e < __stop___ex_table; e++) if (ex_addr == (unsigned long)&e->insn + e->insn) return (unsigned long)&e->fixup + e->fixup; return 0; } Therefore, our focus is on how to construct exception_table_entry. We need to create an exception table entry for each memory access to a user space address and insert it into the _extable segment. For example, the following assembly instructions (the addresses corresponding to the assembly instructions are written arbitrarily, so don’t worry about whether they are right or wrong. Understanding the principles is the key). 0xffff000000000000: ldr x1, [x0] 0xffff000000000004: add x1, x1, #0x10 0xffff000000000008: ldr x2, [x0, #0x10] /* ... */ 0xffff000040000000: mov x0, #0xfffffffffffffff2 // -14 0xffff000040000004: ret Assume that the x0 register holds the user space address, so we need to create an exception table entry for the assembly instruction at address 0xffff000000000000, and we expect that when x0 is an illegal user space address, the repair address returned by the jump is 0xffff000040000000. For simplicity of calculation, assume that this is the creation of the first entry and the value of __start___ex_table is 0xffff000080000000. Then the values of the insn and fixup members of the first exception table entry are: 0x80000000 and 0xbffffffc (both values are negative). Therefore, an entry is created for each user space address access instruction in the copy{to,from}user() assembly code. So the assembly instruction at address 0xffff000000000008 also needs to create an exception table entry. So, what exactly happens if kernel mode accesses an illegal user space address? The above analysis process can be summarized as follows:
IV. ConclusionNow it’s time to review and summarize, and the thinking about copy_{to,from}_user() ends here. Let’s end this article with a summary. Whether accessing a legitimate user space address in kernel mode or user mode, when the virtual address does not establish a mapping relationship with the physical address, the page fault process is almost the same, which will help us apply for physical memory and create a mapping relationship. So in this case memcpy() and copy_{to,from}_user() are similar. When the kernel state accesses an illegal user space address, the repair address is found based on the exception address. This method of repairing the exception does not establish an address mapping relationship, but modifies the return address of do_page_fault(). memcpy() cannot do this. When Finally, I want to say that even in some cases memcpy() can work fine. However, this is also not recommended and is not a good programming practice. In the user space and kernel space data interaction, we must use an interface similar to copy_{to,from}_user(). Why are they similar? Because there are other interfaces for kernel space and user space data interaction, but they are not as famous as copy_{to,from}_user(). For example: {get,put}_user(). This is the end of this article about copy_{to, from}_user(). For more relevant copy and user content, please search 123WORDPRESS.COM’s previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Detailed explanation of how to use eslint in vue
>>: Example code for implementing the wavy water ball effect using CSS
Enctype : Specifies the type of encoding the brows...
When I first came into contact with HTML, I alway...
This article shares the specific code for impleme...
Table of contents Zabbix custom monitoring nginx ...
question: <input type="hidden" name=...
1. Clear floating method 1 Set the height of the ...
Table of contents 1. Open the project directory o...
This article shares the specific code of js canva...
Chinese documentation: https://router.vuejs.org/z...
1. Data Deduplication In daily work, there may be...
Introduction to four commonly used MySQL engines ...
1. Data backup 1. Use mysqldump command to back u...
Object.defineProperty Understanding grammar: Obje...
I encountered mysql ERROR 1045 and spent a long t...
MySQL has non-standard data types such as float a...