assembly
Each Instruction Set Architecture (ISA) has its own assembly language. So there isn’t an universal assembly. For example, given a piece of C code, Gnu as is capable of assembling it to the target ISA (e.g., x86, ARM, RISC-V).
Gnu assembler as (sourceware binutils-gdb)
Gnu binutils project hosts a collection of binary tools, which include as
. The source code is mirrored in github. The manual is here.
On MacOS, as
is not Gnu as. Instead it is an alias to llvm-as
. There are lots of differences.
First, .section .text
works in Linux, but not on MacOS. I got a error message as
1
2
hello.s:6:15: error: unexpected token in '.section' directive
.section .text
The correct way is .section __TEXT,__text,regular,pure_instructions
. Mach-O architecture is different from ELF. The parser code is here. Basically, it expects segment_name,section_name,...
. In ELF, we only need to specify the section name, and the linker knows into which segment to place the section. But in Mach-O, we need to be explicit. See apple developer guide. On the other hand, there is a universal way to make it work for both ELF and Mach-O: just use directive .text
. From the code, you can see that it is the same as above.
RISC-V
The best way to try out RISC-V is by riscv-tools. Checkout out git organization riscv-software-src. It contains RISC-V installer, compiler and emulator.
Using Macbook M1 as an example,
1
2
brew tap riscv-software-src/riscv
brew install riscv-tools
Write a simple C program
1
2
3
4
5
6
#include <stdio.h>
int main() {
printf("hello world\n");
return 0;
}
Then compile it
1
2
3
riscv64-unknown-elf-gcc -S hello.c
riscv64-unknown-elf-gcc -c hello.s -o hello.o
riscv64-unknown-elf-gcc hello.o -o hello
and run it
1
2
3
$ spike pk hello
bbl loader
hello world
Registers and instruction set
See manual
amd64
Registers
I have a hard time of remembering the meaning of registers in amd64 architecture until I found this post. Just cite the summary below:
EAX, EBX, ECX, EDX = A, B, C, D; Note that the ‘A’ register holds function return values
ESI, EDI = Source, Destination (for string operations) - ECX may be the counter and EAX may used, too.
ESP, EBP = Stack Pointer, Base Pointer
EIP = Instruction Pointer
CS, DS, SS, ES, FS, GS = Code, Data, Stack, and Extra segments, followed by F and G Segments
Syntax
x86 assembly language has two main syntax branches: Intel syntax and AT&T syntax. Check wiki for details of their differences. Gnu assembler (gas or as) is probably the most popular one that uses AT&T syntax. And nasm
is most popular one uses Intel syntax.
RIP-relative addressing
1
lea rsi, [rel msg]
References
- https://cs.lmu.edu/~ray/notes/nasmtutorial/
Aarch64
Aarch64 is just arm64. The instruction set used in aarch64 is called A64.
We usually see two different flavors of prologue and epilogue on internet
1
2
3
4
5
6
7
sub sp, sp, #16 // Allocate 16 bytes for frame record
stp x29, x30, [sp] // Save x29 (FP) and x30 (LR) to stack
mov x29, sp // Set FP (frame pointer) to current SP
ldp x29, x30, [sp] // x29 = [sp]; x30 = [sp + 8]
add sp, sp, #16
ret
and
1
2
3
4
5
6
7
sub sp, sp, #32 // Allocate 32 bytes for stack frame
stp x29, x30, [sp, #16] // Save x29 (FP) and x30 (LR) at offset 16
add x29, sp, #16 // Set FP to sp + 16 (to point to saved x29/x30)
ldp x29, x30, [sp, #16]
add sp, sp, #32
ret
The former is a basic Stack Frame Setup. The latter is an advanced Frame Layout. Why called advanced? Because the stack portion [sp, sp+16] can be used for local variables. Why only reserving 16 bytes for local variables? It is a balance. For a function using a lot of local variables, 16 bytes are not enough. But for functions that do not use local variables, it is a waste of stack space. 16 bytes strikes a balance.
See reference.
CFI (Call Frame Information) directives
Assembly Sections
.data
.text
.bss
FAQ
Difference between stack pointer and frame pointer
TODO: write it up.
How to compile with frame pointer enabled/disabled?