Post

assembly

Each Instruction Set Architecture (ISA) has its own assembly language. So there isn’t an universal assembly. For example, given a piece of C code, Gnu as is capable of assembling it to the target ISA (e.g., x86, ARM, RISC-V).

Gnu assembler as (sourceware binutils-gdb)

Gnu binutils project hosts a collection of binary tools, which include as. The source code is mirrored in github. The manual is here.

On MacOS, as is not Gnu as. Instead it is an alias to llvm-as. There are lots of differences.

First, .section .text works in Linux, but not on MacOS. I got a error message as

1
2
hello.s:6:15: error: unexpected token in '.section' directive
.section .text

The correct way is .section __TEXT,__text,regular,pure_instructions. Mach-O architecture is different from ELF. The parser code is here. Basically, it expects segment_name,section_name,.... In ELF, we only need to specify the section name, and the linker knows into which segment to place the section. But in Mach-O, we need to be explicit. See apple developer guide. On the other hand, there is a universal way to make it work for both ELF and Mach-O: just use directive .text. From the code, you can see that it is the same as above.

RISC-V

The best way to try out RISC-V is by riscv-tools. Checkout out git organization riscv-software-src. It contains RISC-V installer, compiler and emulator.

Using Macbook M1 as an example,

1
2
brew tap riscv-software-src/riscv
brew install riscv-tools

Write a simple C program

1
2
3
4
5
6
#include <stdio.h>

int main() {
  printf("hello world\n");
  return 0;
}

Then compile it

1
2
3
riscv64-unknown-elf-gcc -S hello.c
riscv64-unknown-elf-gcc -c hello.s -o hello.o
riscv64-unknown-elf-gcc hello.o -o hello

and run it

1
2
3
$ spike pk hello
bbl loader
hello world

Registers and instruction set

See manual

amd64

Registers

I have a hard time of remembering the meaning of registers in amd64 architecture until I found this post. Just cite the summary below:

EAX, EBX, ECX, EDX = A, B, C, D; Note that the ‘A’ register holds function return values

ESI, EDI = Source, Destination (for string operations) - ECX may be the counter and EAX may used, too.

ESP, EBP = Stack Pointer, Base Pointer

EIP = Instruction Pointer

CS, DS, SS, ES, FS, GS = Code, Data, Stack, and Extra segments, followed by F and G Segments

Syntax

x86 assembly language has two main syntax branches: Intel syntax and AT&T syntax. Check wiki for details of their differences. Gnu assembler (gas or as) is probably the most popular one that uses AT&T syntax. And nasm is most popular one uses Intel syntax.

RIP-relative addressing

1
 lea rsi, [rel msg]

References

  • https://cs.lmu.edu/~ray/notes/nasmtutorial/

CFI (Call Frame Information) directives

Assembly Sections

  • .data
  • .text
  • .bss

FAQ

Difference between stack pointer and frame pointer

TODO: write it up.

How to compile with frame pointer enabled/disabled?

This post is licensed under CC BY 4.0 by the author.