assembly
Each Instruction Set Architecture (ISA) has its own assembly language. So there isn’t an universal assembly. For example, given a piece of C code, Gnu as is capable of assembling it to the target ISA (e.g., x86, ARM, RISC-V).
Gnu assembler as (sourceware binutils-gdb)
Gnu binutils project hosts a collection of binary tools, which include as
. The source code is mirrored in github. The manual is here.
On MacOS, as
is not Gnu as. Instead it is an alias to llvm-as
. There are lots of differences.
First, .section .text
works in Linux, but not on MacOS. I got a error message as
1
2
hello.s:6:15: error: unexpected token in '.section' directive
.section .text
The correct way is .section __TEXT,__text,regular,pure_instructions
. Mach-O architecture is different from ELF. The parser code is here. Basically, it expects segment_name,section_name,...
. In ELF, we only need to specify the section name, and the linker knows into which segment to place the section. But in Mach-O, we need to be explicit. See apple developer guide. On the other hand, there is a universal way to make it work for both ELF and Mach-O: just use directive .text
. From the code, you can see that it is the same as above.
RISC-V
The best way to try out RISC-V is by riscv-tools. Checkout out git organization riscv-software-src. It contains RISC-V installer, compiler and emulator.
Using Macbook M1 as an example,
1
2
brew tap riscv-software-src/riscv
brew install riscv-tools
Write a simple C program
1
2
3
4
5
6
#include <stdio.h>
int main() {
printf("hello world\n");
return 0;
}
Then compile it
1
2
3
riscv64-unknown-elf-gcc -S hello.c
riscv64-unknown-elf-gcc -c hello.s -o hello.o
riscv64-unknown-elf-gcc hello.o -o hello
and run it
1
2
3
$ spike pk hello
bbl loader
hello world
Registers and instruction set
See manual
amd64
Registers
I have a hard time of remembering the meaning of registers in amd64 architecture until I found this post. Just cite the summary below:
EAX, EBX, ECX, EDX = A, B, C, D; Note that the ‘A’ register holds function return values
ESI, EDI = Source, Destination (for string operations) - ECX may be the counter and EAX may used, too.
ESP, EBP = Stack Pointer, Base Pointer
EIP = Instruction Pointer
CS, DS, SS, ES, FS, GS = Code, Data, Stack, and Extra segments, followed by F and G Segments
Syntax
x86 assembly language has two main syntax branches: Intel syntax and AT&T syntax. Check wiki for details of their differences. Gnu assembler (gas or as) is probably the most popular one that uses AT&T syntax. And nasm
is most popular one uses Intel syntax.
RIP-relative addressing
1
lea rsi, [rel msg]
References
- https://cs.lmu.edu/~ray/notes/nasmtutorial/
CFI (Call Frame Information) directives
Assembly Sections
.data
.text
.bss
FAQ
Difference between stack pointer and frame pointer
TODO: write it up.
How to compile with frame pointer enabled/disabled?