A brief analysis of the function calling process under the ARM architecture

A brief analysis of the function calling process under the ARM architecture

1. Background knowledge

1. Introduction to ARM64 registers

2. Detailed explanation of STP instruction (ARMV8 manual)

Let's first look at the instruction format (64bit) and the impact of instructions on the execution results of the register machine.

Type 1, STP <Xt1>, <Xt2>, [<Xn|SP>],#<imm>

Store Xt1 and Xt2 in the address memory corresponding to Xn|SP, and then change the address of Xn|SP to the new address of Xn|SP + imm offset

Type 2, STP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!

Store Xt1 and Xt2 in the address memory corresponding to the address of Xn|SP plus imm, and then change the address of Xn|SP to the new address after the offset of Xn|SP + imm

Type 3, STP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]

Store Xt1 and Xt2 in the address memory corresponding to the address of Xn|SP plus imm

There are three types of opcodes in the manual, and we will only discuss the last two involved in the program.

The pseudocode is as follows:

Shared decode for all encodings
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if L:opc<0> == '01' || opc == '11' then UNDEFINED;
integer scale = 2 + UInt(opc<1>);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = wback || n != 31;
Operation for all encodings
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;
boolean rt_unknown = FALSE;
if HaveMTEExt() then
         SetNotTagCheckedInstruction(!tag_checked);
if wback && (t == n || t2 == n) && n != 31 then
    Constraint c = ConstrainUnpredictable();
    assert c IN {Constraint_NONE, Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
    case c of
        when Constraint_NONE rt_unknown = FALSE; // value stored is pre-writeback
        when Constraint_UNKNOWN rt_unknown = TRUE; // value stored is UNKNOWN
        when Constraint_UNDEF UNDEFINED;
        when Constraint_NOP EndOfInstruction();
if n == 31 then
 CheckSPAlignment();
    address = SP[];
else
    address = X[n];
if !postindex then
    address = address + offset;
if rt_unknown && t == n then
    data1 = bits(datasize) UNKNOWN;
else
    data1 = X[t];
if rt_unknown && t2 == n then
    data2 = bits(datasize) UNKNOWN;
else
    data2 = X[t2];
Mem[address, dbytes, AccType_NORMAL] = data1;
Mem[address+dbytes, dbytes, AccType_NORMAL] = data2;
if wback then
  if postindex then
        address = address + offset;
    if n == 31 then
        SP[] = address;
    else
        X[n] = address;

The red part corresponds to the key logic of stack pushing. For the meaning of other assembly instructions, please refer to the armv8 manual or Baidu.

2. An example

Now that we are familiar with the above parts, let's look at an example:

The C code is as follows:

The disassembly of several related functions is as follows (there are usually only two instructions related to stack push):

main\f3\f4\strlen 

After running through gdb, we can see that the strlen will trigger SEGFAULT, causing the process to hang

After the above code is compiled, there is no strip, so the elf file has symbols

Check the running status (info register): pay attention to the four registers $29, $30, SP, and PC

A core idea: the CPU executes instructions rather than C code, and function calls and returns are actually the process of pushing and popping the thread stack.

Next, let's see how the above call relationship works in the current task stack:

The relationship between function calls in the stack (call function pushes the stack, the address decreases; return pops the stack, the address increases):

The following is the process of pushing the stack (emphasis)

Let’s look back at the previous compilation:

main\f3\f4\strlen 

Starting from the current sp, frame 0 is strlen, and the stack is not opened, so the calling function of the previous level is still x30, so it can be deduced that frame 1 calls f3

The starting entry assembly of function f3:

(gdb) x/2i f3
   0x400600 <f3>: stp x29, x30, [sp,#-48]!
   0x400604 <f3+4>: mov x29, sp

It can be seen that the stack space opened by the f3 function is 48 bytes. Therefore, the top of the stack of frame2 is the current sp + 48 bytes: 0xfffffffff2c0

(gdb) x/gx 0xfffffffff2c0+8
0xfffffffff2c8: 0x000000000040065c
(gdb) x/i 0x000000000040065c
   0x40065c <f4+36>: mov w0, #0x0 // #0
The function of frame2 is sp+8: 0x000000000040065c -> <f4+36>

Continue to push back the function of frame1 from sp = 0xfffffffff2c0

The starting entry assembly of function f4 is:

(gdb) x/2i f4
   0x400638 <f4>: stp x29, x30, [sp,#-48]!
   0x40063c <f4+4>: mov x29, sp

It can be seen that the stack space opened by the f4 function is also 48 bytes. Therefore, the top of the stack of frame3 is the current 0xfffffffff2c0 + 48 bytes: 0xfffffffff2f0

The function of frame2 is 0xffffffff2c0 + 8: 0x000000000040065c -> <f4+36>
(gdb) x/gx 0xfffffffff2f0+8
0xfffffffff2f8: 0x0000000000400684
(gdb) x/i 0x0000000000400684
   0x400684 <main+28>: mov w0, #0x0 // #0

Therefore, the function of frame3 is the main function, and the top of the stack corresponding to the main function is 0xfffffffff320

This concludes the derivation (those who are interested can continue the derivation and see how libc starts main)

Summarize:

The key to push stack:

  • Current scene
  • Familiar with the stack opening method of CPU architecture

3. Practical explanation

The following core is available at the scene: As you can see, all symbols cannot be found. Even after loading the symbol table, it still does not work and the actual call stack cannot be parsed.

(gdb) bt
#0 0x0000ffffaeb067bc in ?? () from /lib64/libc.so.6
#1 0x0000aaaad15cf000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

First look at the info register, pay attention to the values ​​of the four registers x29, x30, sp, and pc

Derived task stack:

Export the sp content first:

The figure below actually shows the result. Let’s describe in detail how to derive it.

pc represents the currently executed function instruction. If the current instruction is not opened, generally x30 represents the next instruction of the previous frame calling the current function. Looking at the assembly, it can be reversed into the following function

(gdb) x/i 0xaaaacd3de4fc
   0xaaaacd3de4fc <PGXCNodeConnStr(char const*, int, char const*, char const*, char const*, char const*, int, char const*)+108>: mov x27, x0

After finding the top function of the stack, check the stack operation of the function:

(gdb) x/6i PGXCNodeConnStr
   0xaaaacd3de490 <PGXCNodeConnStr(char const*, int, char const*, char const*, char const*, char const*, int, char const*)>: sub sp, sp, #0xd0
   0xaaaacd3de494 <PGXCNodeConnStr(char const*, int, char const*, char const*, char const*, char const*, int, char const*)+4>: stp x29, x30, [sp,#80]
   0xaaaacd3de498 <PGXCNodeConnStr(char const*, int, char const*, char const*, char const*, char const*, int, char const*)+8>: add x29, sp, #0x50

It can be seen that the previous frame exists at the current sp + 0xd0 - 0x80, which is 0xfffec4cebd40 + 0xd0 - 0x80 = 0xfffec4cebd90, and the bottom of the stack is at 0xfffec4cebd40 + 0xd0 = 0xfffec4cebe10

Therefore, we find the top of the stack corresponding to the next level frame and the LR return instruction of the previous level. By reversing, we can get the function build_node_conn_str

(gdb) x/i 0x0000aaaacd414e08
   0xaaaacd414e08 <build_node_conn_str(Oid, DatabasePool*)+224>: mov x21, x0

Repeating the above derivation, we can see that the function build_node_conn_str opens a 176-byte stack.

(gdb) x/4i build_node_conn_str
   0xaaaacd414d28 <build_node_conn_str(Oid, DatabasePool*)>: stp x29, x30, [sp,#-176]!
   0xaaaacd414d2c <build_node_conn_str(Oid, DatabasePool*)+4>: mov x29, sp

So continue with 0xfffec4cebe10 + 176 = 0xfffec4cebec0

Check the caller 0xfffec4cebe10+8 for reload_database_pools

Continue to see reload_database_pools

(gdb) x/8i reload_database_pools
   0xaaaacd4225e8 <reload_database_pools(PoolAgent*)>: sub sp, sp, #0x1c0
   0xaaaacd4225ec <reload_database_pools(PoolAgent*)+4>: adrp x5, 0xaaaad15cf000
   0xaaaacd4225f0 <reload_database_pools(PoolAgent*)+8>: adrp x3, 0xaaaacf0ed000
   0xaaaacd4225f4 <reload_database_pools(PoolAgent*)+12>: adrp x4, 0xaaaaceeed000 <_ZN4llvm18ConvertUTF8toUTF16EPPKhS1_PPtS3_NS_15ConversionFlagsE>
   0xaaaacd4225f8 <reload_database_pools(PoolAgent*)+16>: add x3, x3, #0x9e0
   0xaaaacd4225fc <reload_database_pools(PoolAgent*)+20>: adrp x1, 0xaaaacf0ee000 <_ZZ25PoolManagerGetConnectionsP4ListS0_E8__func__+24>
   0xaaaacd422600 <reload_database_pools(PoolAgent*)+24>: stp x29, x30, [sp,#-96]!

The actual stack is opened at 0x220 bytes, so the stack bottom of this frame is 0xfffec4cebec0 + 0x220 = 0xfffec4cec0e0

Therefore, the structure of the basic calling relationship is as follows

The above is basically enough to analyze the problem, so there is no need to continue to derive

TIPS: This instruction is generally used in calls under the arm architecture.

stp x29, x30, [sp,#immediate]! with or without exclamation mark

Therefore, each frame layer stores the stack top address and LR instruction of the previous frame layer. By accurately finding the stack top of the bottom frame 0, all call relationships can be quickly deduced (the part circled by red dashed circles). The reverse solution of the function depends on the symbol table. As long as the symbol segment of the original elf file is not stripped, the corresponding function symbol can be found (check it through readelf -S).

After finding the Frame, the content in each layer of the frame, combined with the assembly, can basically be used to deduce the process variables.

The above is a brief analysis of the detailed content of the function calling process under the ARM architecture. For more information about the function calling process under the ARM architecture, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Example explanation of alarm function in Linux
  • PHP executes 6 Linux command function code examples
  • Detailed explanation of the use of stat function and stat command in Linux
  • How to get the current time using time(NULL) function and localtime() in Linux
  • How to add a timeout to a Python function on Linux/Mac
  • Linux unlink function and how to delete files
  • Detailed explanation of the use of Linux lseek function

<<:  Sample code using scss in uni-app

>>:  Detailed explanation of how to use element-plus in Vue3

Recommend

Detailed explanation of how to use the Vue license plate input component

A simple license plate input component (vue) for ...

HTML multi-header table code

1. Multi-header table code Copy code The code is a...

JS implementation of Apple calculator

This article example shares the specific code of ...

Understand CSS3 Grid layout in 10 minutes

Basic Introduction In the previous article, we in...

Summary of several commonly used CentOS7 images based on Docker

Table of contents 1 Install Docker 2 Configuring ...

Zookeeper unauthorized access test problem

Table of contents Preface Detect Zookeeper servic...

Summary of 11 amazing JavaScript code refactoring best practices

Table of contents 1. Extracting functions 2. Merg...

MySQL 8.0.12 installation configuration method and password change

This article records the installation and configu...

Some thoughts and experience sharing on web page (website) design and production

First, before posting! Thanks again to I Want to S...

25 fresh useful icon sets for download abroad

1. E-Commerce Icons 2. Icon Sweets 2 3. Mobile Ph...

Four ways to modify the default CSS style of element-ui components in Vue

Table of contents Preface 1. Use global unified o...

A brief introduction to Linux performance monitoring commands free

When the system encounters various IO bottlenecks...

Common problems in implementing the progress bar function of vue Nprogress

NProgress is the progress bar that appears at the...

A brief discussion on the solution of Tomcat garbled code and port occupation

Tomcat server is a free and open source Web appli...

A complete guide to CSS style attributes css() and width() in jQuery

Table of contents 1. Basic use of css(): 1.1 Get ...