SiFive - August 21, 2017

All Aboard, Part 2: Relocations in ELF Toolchains

Our first stop on our exploration of the RISC-V toolchain will be an overview of ELF relocations and how they are used by the RISC-V toolchain. We'll shy away from discussing linker relaxations and their impact on performance for a follow-up blog post so this doesn't get too long. The example has been carefully constructed to be unrelaxable as to avoid confusion. Additionally, we're only going to discuss the relocations used by statically linked executables, avoid discussing position independent executables and forget about thread local storage -- like linker relaxation, all of those warrant a whole post on their own. There will be a lot more to come about relocations in later blog posts.

An Example of a Relocation in a C Program

Relocations are a concept that exists due to the split between the compiler and the linker that is present in most toolchains. While the specifics of this article will apply only to ELF-based RISC-V toolchains (i.e., GCC+binutils or LLVM), the general concept of relocations exists in farther-reaching compilers like Hotspot. Since relocations exist to pass information between the compiler and linker, let's first look at how a simple program is compiled. Take the following C code:

long global_symbol[2];

int main() {
  return global_symbol[0] != 0;
}

Even though a single GCC invocation can produce a binary for this simple case, under the covers the GCC driver script is actually running the preprocessor, then the compiler, then the assembler and finally the linker. The --save-temps argument to GCC allows users to see all these intermediate files, and is a useful argument for poking around inside the toolchain.

$ riscv64-unknown-linux-gnu-gcc relocation.c -o relocation -O3 --save-temps

Each step in this run of the GCC wrapper script generates a file:

  • relocation.i: The preprocessed source, which expands any preprocessor directives (things like #include or #ifdef).
  • relocation.s: The output of the actual compiler, which is an assembly file (a text file in the RISC-V assembly format).
  • relocation.o: The output of the assembler, which is an un-linked object file (an ELF file, but not an executable ELF).
  • relocation: The output of the linker, which is a linked executable (an executable ELF file).

The first step is to run the preprocessor. Since this is a simple source file with no preprocessor macros, the preprocessor run is pretty boring: all it does is emit some directives to be used if debugging information is later generated:

$ cat relocation.i
# 1 "relocation.c"
# 1 "built-in"
# 1 "command-line"
# 31 "command-line"
# 1 "/scratch/palmer/work/upstream/riscv-gnu-toolchain/build/install/sysroot/usr/include/stdc-predef.h" 1 3 4
# 32 "command-line" 2
# 1 "relocation.c"
long global_symbol;

int main() {
  return global_symbol != 0;
}

The preprocessed output is then fed through the compiler, which generates a assembly file. It is at this point at which we begin to see why relocations are necessary. This file is plain-text that contains RISC-V assembly code and therefore is easy to read, so let's take a look right now:

$ cat relocation.s
main:
  lui   a5,%hi(global_symbol)
  ld    a0,%lo(global_symbol)(a5)
  snez  a0,a0
  ret

If you're not accustomed to reading the assembly output from RISC-V's GCC port then this might look a bit odd: there's an additional pair of addressing modes that aren't listed anywhere in the RISC-V instruction manual and don't really look like they could be sensibly implemented in hardware: %hi(global_symbol) and %lo(global_symbol)(a5).

These addressing modes exist to allow the compiler to address global symbols. The fundamental problem with addressing global symbols is that the compiler must emit assembly instructions in order to access said symbols, but the actual address of those global symbols cannot be known until link time, an impossible task. As a concrete example try to figure out what bits the compiler would emit for the lui that addresses global_symbol.

Relocations resolve this discrepancy: when the compiler is unable to know the bits that should be emitted as part of a particular instruction, in instead just emits arbitrary bits for that instruction and also emits a relocation entry. This relocation entry points to the bits that will be emitted and contains enough information for the linker to fill out those bits.

The specifics of this are probably best explained by example, so let's go through the simple program above to see how it all works. The next link in the toolchain is the assembler, which takes in the assembly file from above and produces an ELF object file that has not yet been linked. You can examine these object files with objdump, which I've done below:

$ riscv64-unknown-linux-gnu-objdump -d -t -r relocation.o

relocation.o:     file format elf64-littleriscv

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 relocation.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .text.startup  0000000000000000 .text.startup
0000000000000000 l    d  .comment       0000000000000000 .comment
0000000000000000 g     F .text.startup  000000000000000e main
0000000000000010       O *COM*  0000000000000008 global_symbol

Disassembly of section .text.startup:

0000000000000000 main:
   0:   000007b7                lui     a5,0x0
                        0: R_RISCV_HI20 global_symbol
                        0: R_RISCV_RELAX        *ABS*
   4:   0007b503                ld      a0,0(a5) # 0 main
                        4: R_RISCV_LO12_I       global_symbol
                        4: R_RISCV_RELAX        *ABS*
   8:   00a03533                snez    a0,a0
   c:   8082                    ret

Now is the first point at which you get to explicitly see a relocation (which are only shown when the -r argument is passed to objdump). Here we can see four RISC-V-specific relocations in two pairs: a R_RISCV_HI20+R_RISCV_RELAX pair for the lui and a R-RISCV_LO12_I+R_RISCV_RELAX pair for the ld. The R_RISCV_RELAX relocations exist solely to signify that it is legal to perform linker relaxation on the previous relocation. Since we're not talking about linker relaxation in this blog entry, we can just ignore those entries for now.

The other two relocations pair explicitly with an addressing mode present in the RISC-V ISA: R_RISCV_HI20 pairs with a U-format immediate while R_RISCV_LO12_I pairs with an I-format immediate. In general, you'll find that every addressing mode with an immediate will have at least one relocation that fills out that immediate -- sometimes there'll be a handful more if that instruction format is used to link against more complicated forms of symbols as well (for example, PIC or TLS relocations).

Before we get too deep into relocations, let's quickly examine how the toolchain works when it's possible to fill out a relocation correctly. The next link in the toolchain is the linker, which consumes the relocations generated by the assembler to fill our the relevant bits in the output ELF executable. The program now has all the glibc startup code so it's become quite large. Thus, I'm only posting the relevant snippets below:

$ riscv64-unknown-linux-gnu-objdump -d -t -r relocation
relocation:     file format elf64-littleriscv

SYMBOL TABLE:
0000000000012038 g     O .bss 0000000000000010              global_symbol
...

Disassembly of section .text:

0000000000010330 main:
 10330:       67c9                    lui     a5,0x12
 10332:       0387b503                ld      a0,56(a5) # 12038 global_symbol
 10336:       00a03533                snez    a0,a0
 1033a:       8082                    ret

As you can see, the symbol table now has an actual address for global_symbol, the instructions that were referenced by the relocations have some non-zero bits filled out to reference global_symbol, and the relocations have been dropped from the ELF file as they're no longer necessary -- this is only strictly the case because we have a statically-linked symbol, relocating dynamic symbols is deferred to the loader in that case.

The relocation truncated to fit Error Message

Now that you know a bit about what relocations are we can discuss most people's only exposure to relocations: the relocation truncated to fit error message that appears when linking. It's hard to explain this message to people who don't understand relocations, but if you understand what a relocation is then it's not actually that tricky of an error message.

In order to explain the error message, we'll start with an extremely simple program. In this case we don't want anything from the C library to show up in our error message so we're defining _start instead of main and then avoiding any standard library objects by passing -nostdlib -nostartfiles to GCC -- this program won't actually work, but it'll serve to explain what's going on. Moving the text section with -Wl,-Ttext-segment,0x80000000 will actually trigger the bug, you'll see why below:

$ cat reloc_fail.c
long global_symbol;
int _start() {
  return global_symbol;
}
$ riscv64-unknown-linux-gnu-gcc reloc_fail.c -o reloc_fail -O3 -nostartfiles -nostdlib --save-temps  -Wl,-Ttext-segment,0x80000000
reloc_fail.o: In function `_start':
reloc_fail.c:(.text+0x0): relocation truncated to fit: R_RISCV_HI20 against symbol `global_symbol' defined in COMMON section in reloc_fail.o
/scratch/palmer/work/20170725-binutils-2.29/install/bin/../lib/gcc/riscv64-unknown-linux-gnu/7.1.1/../../../../riscv64-unknown-linux-gnu/bin/ld: final link failed: Symbol needs debug section which does not exist
collect2: error: ld returned 1 exit status

On the surface this looks like a super scary error message: there are all sorts of references to temporary objects; the mention of symbols, sections and relocations; and an odd message about debug sections. This is usually the point at which people give up and call a toolchain hacker, but with your newfound knowledge of relocations you should be able to figure out what's going on here.

First, let's focus on only the important part of the error message and ignore all the cruft that's not actually relevant. The actual error you want to look at here is:

reloc_fail.c:(.text+0x0): relocation truncated to fit: R_RISCV_HI20 against symbol `global_symbol'

which simply states that the compiler generated a R_RISCV_HI20 relocation against the address global_symbol, but that the linker was unable to fit the symbol's full address into the bits specified by that relocation. The phrase "truncated to fit" is a bit odd: what the linker is actually saying is that the address in the relocation must be truncated to fit into the bits allocated by the relocation if it was to fit, but since this is an error the linker isn't really truncating anything.

In order to start really delving into the "why" of the error message, we need to first look at the input to the linker, which in this case is the object file generated by the assembler. Like the above example, we need the relocation because the compiler needs to reference a global symbol that it can't know the address for.

$ riscv64-unknown-linux-gnu-objdump -d -r reloc_fail.o
reloc_fail.o:     file format elf64-littleriscv

Disassembly of section .text:

0000000000000000 <_start>:
   0:   000007b7                lui     a5,0x0
                        0: R_RISCV_HI20 global_symbol
                        0: R_RISCV_RELAX        *ABS*
   4:   0007a503                lw      a0,0(a5) # 0 <_start>
                        4: R_RISCV_LO12_I       global_symbol
                        4: R_RISCV_RELAX        *ABS*
   8:   8082                    ret

We can't actually see the linker output because it's impossible to link this file. Since I hate doing arithmetic by hand, I instead just went ahead and modified the linker to omit the range check when performing relocations with the patch shown below:

$ git diff
diff --git a/bfd/elfnn-riscv.c b/bfd/elfnn-riscv.c
index 3c04507623c3..f8a97411de35 100644
--- a/bfd/elfnn-riscv.c
+++ b/bfd/elfnn-riscv.c
@@ -1492,8 +1492,6 @@ perform_relocation (const reloc_howto_type *howto,
     case R_RISCV_GOT_HI20:
     case R_RISCV_TLS_GOT_HI20:
     case R_RISCV_TLS_GD_HI20:
-      if (ARCH_SIZE > 32 && !VALID_UTYPE_IMM (RISCV_CONST_HIGH_PART (value)))
-       return bfd_reloc_overflow;
       value = ENCODE_UTYPE_IMM (RISCV_CONST_HIGH_PART (value));
       break;

With the above patch, the linker can generate an incorrect object file that we can inspect, which I've shown below:

$ riscv64-unknown-linux-gnu-objdump -d -t reloc_fail
reloc_fail:     file format elf64-littleriscv

SYMBOL TABLE:
00000000800000b0 l    d  .text  0000000000000000 .text
00000000800010c0 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .comment       0000000000000000 .comment
0000000000000000 l    df *ABS*  0000000000000000 reloc_fail.c
00000000800018ba g       .text  0000000000000000 __global_pointer$
00000000800010c0 g     O .bss   0000000000000008 global_symbol
00000000800000b0 g     F .text  000000000000000a _start
00000000800010ba g       .bss   0000000000000000 __bss_start
00000000800010ba g       .bss   0000000000000000 _edata
00000000800010c8 g       .bss   0000000000000000 _end

Disassembly of section .text:

00000000800000b0 <_start>:
    800000b0:   800017b7                lui     a5,0x80001
    800000b4:   0c07a503                lw      a0,192(a5) # ffffffff800010c0 <__global_pointer$+0xfffffffefffff806>
    800000b8:   8082                    ret

As we can clearly see, the instructions that load the value of global_symbol do not actually match the address of global_symbol as listed by the symbol table, which is exactly what the relocation truncated to fit error message is trying to say. In the particular case of the R_RISCV_HI20+R_RISCV_LO12_I relocation pair the largest absolute address that can be generated is 0x7FFFFFFF -- remember U-type immediates are signed on RISC-V, so any larger absolute address overflows on RV64.

While every architecture performs some relocations when linking, RISC-V leverages the linker's relocation infrastructure more aggressively than any other architecture so these sorts of issues may crop up more frequently than in other ports. We'll be talking a lot about relocations in the blog as they frequently drive other toolchain design issues.