Post

Linker

Why linkers?

I became interested in linker by reading Ian Lance Taylor’s blog series. He was writing a new linker gold for binutils-gdb. There are lots of things I do not understand from his blogs, so I tried to find a book that systematically talks about linkers. That is “Linkers and Loaders” by John R. Levine. But some parts of the book is outdated and he spent large paragraphs discussing object file formats that I am not interested in at all. Later on, I found Ulrich’s paper How to write shared libraries.

All the names mentioned above are big figures in the history of C/C++ tool chain. Sometimes I wonder how much time and effort I need before reaching similar levels as them!

路漫漫其修远兮,吾将上下而求索

Linkers and loaders

Recently, I am reading book “Linkers and Loaders” by John R. Levine. Chapter 4 talks about different sections in ELF object format. It says in C++ initializers and finalizers are put into .init and .fini sections respectively. However, this is outdated. Checkout this post instead.

In chapter 6 - “Libraries”, John criticised the search order design of linker when loading libraries. As claimed by John, loading by groups instead of tweaking command line arguments order is much better. Actually, --start-group and --end-group are already supported in ld linker.

strong and weak symbols

Weak external symbols are treated as normal global symbols if some input file happens to define them, or zero otherwise.

There a few good blogs talking about it:

  • https://devblogs.microsoft.com/oldnewthing/20130108-00/?p=5623
  • https://leondong1993.github.io/2017/04/strong-weak-symbol/)

Relocation

Cite from “Linkers and loaders”:

We use relocation to refer both to the process of adjusting program addresses to account for non-zero segment origins, and the process of resolving references to external symbols, since the two are frequently handled together.

Position independent code (PIC)

When loader loads the executable from disk, it cannot guarantee to put the binary at a fix memory address. This means that the loader need to relocate all data references inside the binary. If the memory chunk starts at 5000. It will need to add this number to data addresses in this binary. The associated problem is efficiency. If two programs loads the same library into different memory locations, then how to share the same binary.

The solution in ELF position independent code.

You can learn more about loaders and PIC in chapter 8 of “Linkers and Loaders”.

.sa and stub files

https://stackoverflow.com/questions/8876231/what-is-a-sa-file-for-gcc-shared-library

ld options

All options can be found here. Here, I document some of them to enhance my understanding.

-e symbol_name: Specifies the entry point of a main executable. In Linux, the default entry name is _start. In MacOS, it is start.

--start-group and --end-group: The specified archives are searched repeatedly until no new undefined references are created. This is used to resolve circular dependency. See this stack overflow answer.

--start-lib and --end-lib: You can pass a .a archive file to linker. But this is not convenient in some cases. For example, if you have a bunch of object files and you want linker to treat them as a archive file as whole, then you need to archive them first and delete the archive after linking. You can see more details from Fangrui Song’s post.

--whole-archive: Not all object files inside an archive file will be linked to the final executable. Only the ones that are actually used will be linked! This will cause global variable initialization error sometimes.

--hash-style: possible values are sysv, gnu and both. Read more about the difference from below articles:

  • https://flapenguin.me/elf-dt-hash
  • https://flapenguin.me/elf-dt-gnu-hash

ld.so

ld.so is the dynamic linker.

LD_LOAD_NOW: resolve all symbols at program startup instead of deferring to the first time being used.

LD_DEBUG: print out debug information about the dynamic linker.

readelf usage

-p --string-dump=<number|name>: Dump the contents of section <number|name> as strings. Ex: get the dynamic linker for binary

1
2
3
4
# readelf -p .interp /bin/ls

String dump of section '.interp':
  [     0]  /lib64/ld-linux-x86-64.so.2

Gold

Build

MacOS

Checkout the binutils-gdb. In the root folder, run

1
./configure --prefix=/Users/xiongding/tmp/ --enable-gold=yes --disable-nls --with-gmp=/opt/homebrew/Cellar/gmp/6.2.1_1/ --with-mpfr=/opt/homebrew/Cellar/mpfr/4.2.0-p4

The make

1
2
brew install texinfo
make -j

The above steps do not work. Probably because the message during config.

1
2
3
*** This configuration is not supported in the following subdirectories:
     gdbserver ld gas gdb gprof sim
    (Any other directories should still work fine.)

Ubuntu

1
2
3
apt-get install -y build-essential libmpc-dev texinfo flex bison
./configure --enable-gold
make -j

The build process failed initially as some tool is missing. After installing these tools, it failed again due to some error message related to bison input file. I guess an interrupted build may created some bad bison file, so I rm the whole folder and checkout it again. Then build succeeded.

Code deep dive

Initial tasks: read symbols -> add symbols -> layout output sections.

Middle tasks:

Final tasks: write output to file.

Reference

  • https://www.airs.com/blog/archives/53
  • https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/34417.pdf
This post is licensed under CC BY 4.0 by the author.