A brief analysis of the function calling process under the ARM architecture

A brief analysis of the function calling process under the ARM architecture

1. Background knowledge

1. Introduction to ARM64 registers

2. Detailed explanation of STP instruction (ARMV8 manual)

Let's first look at the instruction format (64bit) and the impact of instructions on the execution results of the register machine.

Type 1, STP <Xt1>, <Xt2>, [<Xn|SP>],#<imm>

Store Xt1 and Xt2 in the address memory corresponding to Xn|SP, and then change the address of Xn|SP to the new address of Xn|SP + imm offset

Type 2, STP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!

Store Xt1 and Xt2 in the address memory corresponding to the address of Xn|SP plus imm, and then change the address of Xn|SP to the new address after the offset of Xn|SP + imm

Type 3, STP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]

Store Xt1 and Xt2 in the address memory corresponding to the address of Xn|SP plus imm

There are three types of opcodes in the manual, and we will only discuss the last two involved in the program.

The pseudocode is as follows:

Shared decode for all encodings
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if L:opc<0> == '01' || opc == '11' then UNDEFINED;
integer scale = 2 + UInt(opc<1>);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = wback || n != 31;
Operation for all encodings
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;
boolean rt_unknown = FALSE;
if HaveMTEExt() then
         SetNotTagCheckedInstruction(!tag_checked);
if wback && (t == n || t2 == n) && n != 31 then
    Constraint c = ConstrainUnpredictable();
    assert c IN {Constraint_NONE, Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
    case c of
        when Constraint_NONE rt_unknown = FALSE; // value stored is pre-writeback
        when Constraint_UNKNOWN rt_unknown = TRUE; // value stored is UNKNOWN
        when Constraint_UNDEF UNDEFINED;
        when Constraint_NOP EndOfInstruction();
if n == 31 then
 CheckSPAlignment();
    address = SP[];
else
    address = X[n];
if !postindex then
    address = address + offset;
if rt_unknown && t == n then
    data1 = bits(datasize) UNKNOWN;
else
    data1 = X[t];
if rt_unknown && t2 == n then
    data2 = bits(datasize) UNKNOWN;
else
    data2 = X[t2];
Mem[address, dbytes, AccType_NORMAL] = data1;
Mem[address+dbytes, dbytes, AccType_NORMAL] = data2;
if wback then
  if postindex then
        address = address + offset;
    if n == 31 then
        SP[] = address;
    else
        X[n] = address;

The red part corresponds to the key logic of stack pushing. For the meaning of other assembly instructions, please refer to the armv8 manual or Baidu.

2. An example

Now that we are familiar with the above parts, let's look at an example:

The C code is as follows:

The disassembly of several related functions is as follows (there are usually only two instructions related to stack push):

main\f3\f4\strlen 

After running through gdb, we can see that the strlen will trigger SEGFAULT, causing the process to hang

After the above code is compiled, there is no strip, so the elf file has symbols

Check the running status (info register): pay attention to the four registers $29, $30, SP, and PC

A core idea: the CPU executes instructions rather than C code, and function calls and returns are actually the process of pushing and popping the thread stack.

Next, let's see how the above call relationship works in the current task stack:

The relationship between function calls in the stack (call function pushes the stack, the address decreases; return pops the stack, the address increases):

The following is the process of pushing the stack (emphasis)

Let’s look back at the previous compilation:

main\f3\f4\strlen 

Starting from the current sp, frame 0 is strlen, and the stack is not opened, so the calling function of the previous level is still x30, so it can be deduced that frame 1 calls f3

The starting entry assembly of function f3:

(gdb) x/2i f3
   0x400600 <f3>: stp x29, x30, [sp,#-48]!
   0x400604 <f3+4>: mov x29, sp

It can be seen that the stack space opened by the f3 function is 48 bytes. Therefore, the top of the stack of frame2 is the current sp + 48 bytes: 0xfffffffff2c0

(gdb) x/gx 0xfffffffff2c0+8
0xfffffffff2c8: 0x000000000040065c
(gdb) x/i 0x000000000040065c
   0x40065c <f4+36>: mov w0, #0x0 // #0
The function of frame2 is sp+8: 0x000000000040065c -> <f4+36>

Continue to push back the function of frame1 from sp = 0xfffffffff2c0

The starting entry assembly of function f4 is:

(gdb) x/2i f4
   0x400638 <f4>: stp x29, x30, [sp,#-48]!
   0x40063c <f4+4>: mov x29, sp

It can be seen that the stack space opened by the f4 function is also 48 bytes. Therefore, the top of the stack of frame3 is the current 0xfffffffff2c0 + 48 bytes: 0xfffffffff2f0

The function of frame2 is 0xffffffff2c0 + 8: 0x000000000040065c -> <f4+36>
(gdb) x/gx 0xfffffffff2f0+8
0xfffffffff2f8: 0x0000000000400684
(gdb) x/i 0x0000000000400684
   0x400684 <main+28>: mov w0, #0x0 // #0

Therefore, the function of frame3 is the main function, and the top of the stack corresponding to the main function is 0xfffffffff320

This concludes the derivation (those who are interested can continue the derivation and see how libc starts main)

Summarize:

The key to push stack:

  • Current scene
  • Familiar with the stack opening method of CPU architecture

3. Practical explanation

The following core is available at the scene: As you can see, all symbols cannot be found. Even after loading the symbol table, it still does not work and the actual call stack cannot be parsed.

(gdb) bt
#0 0x0000ffffaeb067bc in ?? () from /lib64/libc.so.6
#1 0x0000aaaad15cf000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

First look at the info register, pay attention to the values ​​of the four registers x29, x30, sp, and pc

Derived task stack:

Export the sp content first:

The figure below actually shows the result. Let’s describe in detail how to derive it.

pc represents the currently executed function instruction. If the current instruction is not opened, generally x30 represents the next instruction of the previous frame calling the current function. Looking at the assembly, it can be reversed into the following function

(gdb) x/i 0xaaaacd3de4fc
   0xaaaacd3de4fc <PGXCNodeConnStr(char const*, int, char const*, char const*, char const*, char const*, int, char const*)+108>: mov x27, x0

After finding the top function of the stack, check the stack operation of the function:

(gdb) x/6i PGXCNodeConnStr
   0xaaaacd3de490 <PGXCNodeConnStr(char const*, int, char const*, char const*, char const*, char const*, int, char const*)>: sub sp, sp, #0xd0
   0xaaaacd3de494 <PGXCNodeConnStr(char const*, int, char const*, char const*, char const*, char const*, int, char const*)+4>: stp x29, x30, [sp,#80]
   0xaaaacd3de498 <PGXCNodeConnStr(char const*, int, char const*, char const*, char const*, char const*, int, char const*)+8>: add x29, sp, #0x50

It can be seen that the previous frame exists at the current sp + 0xd0 - 0x80, which is 0xfffec4cebd40 + 0xd0 - 0x80 = 0xfffec4cebd90, and the bottom of the stack is at 0xfffec4cebd40 + 0xd0 = 0xfffec4cebe10

Therefore, we find the top of the stack corresponding to the next level frame and the LR return instruction of the previous level. By reversing, we can get the function build_node_conn_str

(gdb) x/i 0x0000aaaacd414e08
   0xaaaacd414e08 <build_node_conn_str(Oid, DatabasePool*)+224>: mov x21, x0

Repeating the above derivation, we can see that the function build_node_conn_str opens a 176-byte stack.

(gdb) x/4i build_node_conn_str
   0xaaaacd414d28 <build_node_conn_str(Oid, DatabasePool*)>: stp x29, x30, [sp,#-176]!
   0xaaaacd414d2c <build_node_conn_str(Oid, DatabasePool*)+4>: mov x29, sp

So continue with 0xfffec4cebe10 + 176 = 0xfffec4cebec0

Check the caller 0xfffec4cebe10+8 for reload_database_pools

Continue to see reload_database_pools

(gdb) x/8i reload_database_pools
   0xaaaacd4225e8 <reload_database_pools(PoolAgent*)>: sub sp, sp, #0x1c0
   0xaaaacd4225ec <reload_database_pools(PoolAgent*)+4>: adrp x5, 0xaaaad15cf000
   0xaaaacd4225f0 <reload_database_pools(PoolAgent*)+8>: adrp x3, 0xaaaacf0ed000
   0xaaaacd4225f4 <reload_database_pools(PoolAgent*)+12>: adrp x4, 0xaaaaceeed000 <_ZN4llvm18ConvertUTF8toUTF16EPPKhS1_PPtS3_NS_15ConversionFlagsE>
   0xaaaacd4225f8 <reload_database_pools(PoolAgent*)+16>: add x3, x3, #0x9e0
   0xaaaacd4225fc <reload_database_pools(PoolAgent*)+20>: adrp x1, 0xaaaacf0ee000 <_ZZ25PoolManagerGetConnectionsP4ListS0_E8__func__+24>
   0xaaaacd422600 <reload_database_pools(PoolAgent*)+24>: stp x29, x30, [sp,#-96]!

The actual stack is opened at 0x220 bytes, so the stack bottom of this frame is 0xfffec4cebec0 + 0x220 = 0xfffec4cec0e0

Therefore, the structure of the basic calling relationship is as follows

The above is basically enough to analyze the problem, so there is no need to continue to derive

TIPS: This instruction is generally used in calls under the arm architecture.

stp x29, x30, [sp,#immediate]! with or without exclamation mark

Therefore, each frame layer stores the stack top address and LR instruction of the previous frame layer. By accurately finding the stack top of the bottom frame 0, all call relationships can be quickly deduced (the part circled by red dashed circles). The reverse solution of the function depends on the symbol table. As long as the symbol segment of the original elf file is not stripped, the corresponding function symbol can be found (check it through readelf -S).

After finding the Frame, the content in each layer of the frame, combined with the assembly, can basically be used to deduce the process variables.

The above is a brief analysis of the detailed content of the function calling process under the ARM architecture. For more information about the function calling process under the ARM architecture, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Example explanation of alarm function in Linux
  • PHP executes 6 Linux command function code examples
  • Detailed explanation of the use of stat function and stat command in Linux
  • How to get the current time using time(NULL) function and localtime() in Linux
  • How to add a timeout to a Python function on Linux/Mac
  • Linux unlink function and how to delete files
  • Detailed explanation of the use of Linux lseek function

<<:  Sample code using scss in uni-app

>>:  Detailed explanation of how to use element-plus in Vue3

Recommend

Simple usage example of vue recursive component

Preface I believe many students are already famil...

Docker Tutorial: Using Containers (Simple Example)

If you’re new to Docker, take a look at some of t...

Cleverly use CSS3's webkit-box-reflect to achieve various dynamic effects

In an article a long time ago, I talked about the...

How to create a Docker repository using Nexus

The warehouse created using the official Docker R...

Web developers are concerned about the coexistence of IE7 and IE8

I installed IE8 today. When I went to the Microso...

Vue implements an Input component that gets the key display shortcut key effect

I encountered a requirement to customize shortcut...

How is a SQL statement executed in MySQL?

Table of contents 1. Analysis of MySQL architectu...

MySQL Series II Multi-Instance Configuration

Tutorial Series MySQL series: Basic concepts of M...

Notes on using $refs in Vue instances

During the development process, we often use the ...

Mac+IDEA+Tomcat configuration steps

Table of contents 1. Download 2. Installation and...

Implementation of CentOS8.0 network configuration

1. Differences in network configuration between C...

Summary of Commonly Used MySQL Commands in Linux Operating System

Here are some common MySQL commands for you: -- S...