BGGP5: Download
This year, I participated in the Binary Golf Grand Prix, a competition where the goal is to create the smallest possible binary to accomplish a given task. The theme of this year’s event was “Download”, and the goal was to create the smallest program that could download and display the text file at https://binary.golf/5/5
.
I decided to make my entry an ELF x86_64 binary. Since this is my first attempt at doing this sort of thing, I was more focused on learning about the ELF file format than I was on being competitive. Most of the techniques I used for this are already pretty well known, and I recommend checking out the articles I link to throughout this writeup to get a more in-depth understanding of how these techniques work. I was also pretty surprised by how much of a difference there was between an executable produced by gcc and an executable created manually - the ELF produced by gcc without any special compiler flags was nearly 100 times the size of my final entry! Throughout this writeup, I’ve tried to explain where all that extra overhead comes from.
A First Attempt
I started off with a simple assembly program that prints the contents of the BGGP5 page using curl
. (One could argue that it’s cheating to call an external binary to do all the work for you, but unfortunately I didn’t have enough free time this month to learn how SSL works.)
section .data
path: db "/bin/curl", 0
arg: db "https://binary.golf/5/5", 0
argv: dq path, arg
dq 0
section .text
global _start
_start:
mov rdi, path
mov rsi, argv
xor rdx, rdx
mov rax, 59 ; sys_execve
syscall
These instructions take up 30 bytes:
00000000: 48 bf 00 20 40 00 00 00 00 00 48 be 22 20 40 00 H.. @.....H." @.
00000010: 00 00 00 00 48 31 d2 b8 3b 00 00 00 0f 05 ....H1..;.....
To get a sense of what the compiler would normally produce, we’ll start by compiling the binary in the most naive way possible, without any tricks. The resulting executable is 13696 bytes, which is certainly a lot more than we’d expect from a sequence of instructions that’s only 30 bytes long! Stripping symbols with the -s
flag helps a little, but not by much - we’re still at 13224 bytes.
Looking at the output of readelf
, we can see a list of 13 sections:
Section Headers:
[Nr] Name Type Address Offset Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000000238 00000238 000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.gnu.bu[...] NOTE 0000000000000254 00000254 0000000000000024 0000000000000000 A 0 0 4
[ 3] .gnu.hash GNU_HASH 0000000000000278 00000278 000000000000001c 0000000000000000 A 4 0 8
[ 4] .dynsym DYNSYM 0000000000000298 00000298 0000000000000018 0000000000000018 A 5 1 8
[ 5] .dynstr STRTAB 00000000000002b0 000002b0 0000000000000001 0000000000000000 A 0 0 1
[ 6] .rela.dyn RELA 00000000000002b8 000002b8 0000000000000060 0000000000000018 A 4 0 8
[ 7] .text PROGBITS 0000000000001000 00001000 000000000000001e 0000000000000000 AX 0 0 16
[ 8] .eh_frame PROGBITS 0000000000002000 00002000 0000000000000000 0000000000000000 A 0 0 8
[ 9] .dynamic DYNAMIC 0000000000002ee0 00002ee0 0000000000000120 0000000000000010 WA 5 0 8
[10] .data PROGBITS 0000000000003000 00003000 000000000000003a 0000000000000000 WA 0 0 4
[11] .symtab SYMTAB 0000000000000000 00003040 0000000000000108 0000000000000018 12 7 8
[12] .strtab STRTAB 0000000000000000 00003148 0000000000000038 0000000000000000 0 0 1
[13] .shstrtab STRTAB 0000000000000000 00003180 0000000000000079 0000000000000000
When I was experimenting, I found that if I disabled PIE with the -no-pie
flag, the .interp
, .gnu.hash
, .dynsym
, .dynstr
, .rela.dyn
, .eh_frame
, .dynamic
, .symtab
, and .strtab
sections were no longer present, resulting in a file that’s 8616 bytes. That’s still pretty bad, but it’s a significant decrease in size.
These extra sections are related to the relocations necessary to run a position-independent executable. Since a position-independent executable could potentially be loaded at any base address, the program needs to store metadata describing what addresses need to be modified. For example, the instruction mov rdi, path
is represented as mov rdi, 0x3000
(48 bf 00 30 00 00 00 00 00 00
) when viewed in a disassembler, but when we run the program in GDB the disassembly reads movabs rdi, 0x555555557000
(48 bf 00 70 55 55 55 55 00 00
). The relocation information indicated that the address 0x3000
was a relative address that needed to be added to the base address 0x555555554000
.
However, if we disable PIE, we don’t need to relocate anything - we know that path
will always be loaded at the address 0x402000
, so the instruction mov rdi, path
can simply be represented as mov rdi, 0x402000
in the .text
section.
Section Alignment
8616 bytes is still a lot for such a simple program, so what’s going on? Looking at the executable in a hex editor, the most obvious problem is that the instructions in the .text
section are followed by a sequence of null bytes much longer than the instructions themselves.
00000000: 48 bf 00 20 40 00 00 00 00 00 48 be 22 20 40 00 H.. @.....H." @.
00000010: 00 00 00 00 48 31 d2 b8 3b 00 00 00 0f 05 00 00 ....H1..;.......
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
The reason for this has to do with page alignment. Since our executable contains sections with different permissions (.text
has permissions r-x
, and .data
section has permissions rw-
), the sections must be loaded in different pages. In order to handle this, padding bytes are inserted so that both the .text
and .data
sections start at a page-aligned address.
We can deal with this by passing the -N
option to ld. This combines the .text
and .data
sections into a single rwx
segment, thus removing the need for padding. Without the padding, our file is down to 680 bytes - a marked improvement over our earlier attempts, but still not great.
Building an ELF file manually
Let’s take a look at what’s left in the file at this point:
- The ELF header (64 bytes), program headers (56 * 2 = 112 bytes), and section headers (64 * 5 = 320 bytes).
- The
.text
section containing the instructions themselves (30 bytes). - The
.data
section containing the arguments tocurl
(58 bytes). - The
.note.gnu.build-id
section (36 bytes). - The
.shstrtab
section (42 bytes). - Sequences of 0x00 bytes to keep the sections 16 byte aligned (20 bytes).
We clearly need the instructions in the .text
section and the arguments in the .data
section, but the other sections are completely unnecessary. How do we compile a binary without them?
At this point, we’re running out of ways to tell the compiler to generate a smaller binary for us, so it’s time to start doing things by hand. We can create a minimal ELF file that doesn’t contain any extraeneous information, then insert our data and instructions into it manually. To do so, we’ll construct the ELF headers ourselves and compile the executable as a flat binary in nasm, which gets us to 208 bytes. Brian Raiter’s writeup on tiny ELF files was extremely helpful for me in figuring out how to do this.
Our program so far:
BITS 64
org 0x400000
ehdr:
db 7Fh, "ELF" ; magic
db 2 ; class
db 1 ; encoding
db 1 ; version
db 0 ; os
db 0 ; abi_version
db 0,0,0,0,0,0,0 ; padding
dw 2 ; e_type
dw 3Eh ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq phdr - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
phdr:
dd 1 ; p_type
dd 7 ; p_flags
dq 0 ; p_offset
dq $$ ; p_vaddr
dq $$ ; p_paddr
dq filesize ; p_filesz
dq filesize ; p_memsz
dq 0x10 ; p_align
phdrsize equ $ - phdr
args:
path: db "/bin/curl", 0
arg: db "https://binary.golf/5/5", 0
argv: dq path, arg
dq 0
_start:
mov rdi, path
mov rsi, argv
xor rdx, rdx
mov rax, 59 ; sys_execve
syscall
filesize equ $ - $$
(When I was troubleshooting, I also found that GDB no longer recognizes our program as a valid executable, presumably because we don’t have a section header. I’m a little surprised I haven’t run into any malware that uses this as an anti-debugging technique.)
Storing data in the headers
There are a few fields in the ELF header that aren’t necessary for our executable to run. The 7-byte padding
field is an obvious first place to try storing data, and the path /bin/curl
isn’t much longer than that. Turns out the version
, os
, and abi_version
fields don’t matter either, so we can modify the start of the ELF header to look like this:
ehdr:
db 7Fh, "ELF" ; magic
db 2 ; class
db 1 ; encoding
path:
db "/bin/curl", 0 ; version, os, abi_version, padding
The p_align
field at the end of the program header also isn’t used, so we can use that space to instead store the first 8 bytes of the URL https://binary.golf/5/5
. In addition, it seems that the area in memory immediately after our executable code is filled with 0s, so if we put argv
right at the end of the file then it doesn’t need to be null terminated.
Other unused fields can be used to store the instructions themselves. The e_shoff
field of the ELF header is unused since we don’t have a section header, so we can use that field as well as the adjacent unused e_flags
field to store 12 bytes of instructions. The p_paddr
field of the program header can hold another 8 bytes. We can cut down a little bit on instruction size by replacing our 64-bit operands to mov
and xor
with 32-bit operands, thereby allowing us to fit the entirety of the program into the headers.
We can also remove the last 6 bytes of the ELF header, causing it to overlap with the start of the program header. We can get away with this because e_shentsize
and e_shstrndx
are never parsed if we don’t actually have any section headers.
Pretty much all of the tricks I used here are shamelessly stolen from Nathan Otterness’ writeup on small ELF files, which includes a helpful graph showing which bytes of the ELF header can be used to store arbitrary data. xcellerator’s article on the same subject goes into some more detail on what some of these fields are and why we can overwrite them.
Our final result now looks like this:
BITS 64
org 0x400000
ehdr:
db 7Fh, "ELF" ; magic
db 2 ; class
db 1 ; encoding
path:
db "/bin/curl", 0 ; version, os, abi_version, padding
dw 2 ; e_type
dw 3Eh ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq phdr - $$ ; e_phoff
_start:
mov edi, path ; e_shoff, e_flags
mov esi, argv
jmp part2
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
ehdrsize equ $ + 6 - ehdr
phdr:
dd 1 ; p_type, e_shentsize, e_shnum
dd 7 ; p_flags, e_shstrndx
dq 0 ; p_offset
dq $$ ; p_vaddr
part2: ; p_paddr
xor edx, edx
xor eax, eax
mov al, 59 ; sys_execve
syscall
dq filesize ; p_filesz
dq filesize; p_memsz
phdrsize equ $ - phdr + 8
filesize equ $ - $$
arg: db "https://binary.golf/5/5", 0
argv: dq path, arg
This gets us an executable that’s only 146 bytes!
00000000: 7f45 4c46 0201 2f62 696e 2f63 7572 6c00 .ELF../bin/curl.
00000010: 0200 3e00 0100 0000 2800 4000 0000 0000 ..>.....(.@.....
00000020: 3a00 0000 0000 0000 bf06 0040 00be 8200 :..........@....
00000030: 4000 eb1e 3a00 3800 0100 0100 0000 0700 @...:.8.........
00000040: 0000 0000 0000 0000 0000 0000 4000 0000 ............@...
00000050: 0000 31d2 31c0 b03b 0f05 6a00 0000 0000 ..1.1..;..j.....
00000060: 0000 6a00 0000 0000 0000 6874 7470 733a ..j.......https:
00000070: 2f2f 6269 6e61 7279 2e67 6f6c 662f 352f //binary.golf/5/
00000080: 3500 0600 4000 0000 0000 6a00 4000 0000 5...@.....j.@...
00000090: 0000
Potential Improvements
I was able to produce a much smaller executable than I expected, but I still suspect it may be possible to do a little better. In particular, it’s seriously bothering me that I couldn’t think of a way to fit the argv
array into the ELF header or program header somehow.
When I was experimenting with this, I found that argv[0]
doesn’t have to be the path to the curl
executable - that’s just a convention, and it’s not something that curl
relies on the be true. It does, however, have to be a pointer to valid memory. So in order to fit argv
into the header, you’d need to construct a pattern that looks like the following:
[any valid pointer][pointer to URL string][null terminator]
This almost fits in the program header - p_vaddr
is a valid pointer, and it’s followed by p_paddr
, where we can put anything. However, the field that follows it is p_filesz
, which we can’t set to 0. I’m still not convinced it’s totally impossible to sneak argv
in somewhere, though. Maybe I’ll think of it before next year’s BGGP!