Introspection

DG'hAck 2020 - RE (150 points).

Introspection

Description

Sometimes, you have to look back in order to understand the things that lie ahead.

A colleague of yours has sent you a program he developed and wants to challenge you.

Validate your access and give him the flag.

File: chall.

TL;DR

The program implements a well-known anti-disassembly technique that consists of interleaving specific code and data in the source code so that disassembly analysis tools such as IDA or Ghidra produce an inaccurate set of instructions when they try to disassemble the program. Removing these specific bytes allows us to recover the legitimate control flow and understand how to solve the challenge.

Reverse engineering

The file we’re given is a x86_64 ELF executable:

# file chall
chall: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, BuildID[sha1]=881d6fe5c40b07c2064e5a6efd5ad125d702826a, not stripped

The binary is not stripped so let’s open it in IDA.

main function analysis

The main function is quite simple as it only saves the arguments passed to the program in a g_argv variable of the .bss segment and calls the check_password function.

Here’s the corresponding pseudocode:

main pseudocode

check_password function analysis

At first glance, this function also seems to be pretty simple:

  • read the first 4 bytes of the file referenced by g_argv[0] (here ./chall)
  • the number of bytes actually read (which should be 4) is used to define the number of rounds to apply to an encoding loop composed of xor and left shifts operations
  • this loop gives us a kind of 32 bits hash which must be equal to 0x709E9614 to validate our password

Here’s the corresponding pseudocode:

check_password pseudocode

Let’s fire up z3 and see what it takes to find a valid password and get the flag!

Finding a valid password with z3

Since the program has been statically linked and uses tons of libc functions, I would rather use z3 than angr and focus on the check_password function:

#!/usr/bin/env python3

from z3 import *

def get_password_hash(password):
    # result of fread.
    buf = b'\x7fELF'
    rounds = buf_len = len(buf)
    # encryption loop.
    hash = key = password
    for i in range(1, rounds):
        key = (key << 8) + buf[i % buf_len]
        hash ^= key
    return hash

s = Solver()

password = BitVec('password', 32)

s.add(get_password_hash(password) == 0x709e9614)

while s.check() == sat:
    m = s.model()
    m_pass = m[password].as_long()
    print(f'Password: {m_pass:#x}')
    s.add(password != m_pass)

Result:

# python3 solve.py 
Password: 0xab44c45b
# ./chall 0xab44c45b
Valid password found!
Trusted data: ELF
Nice try :/

Well that was not expected… If we take another look at the main function, we notice that the password_buf variable is global and is used by the check_password function and the main function for different purposes.

Let’s set a watchpoint to see how the buffer is used throughout the program and see if we can somehow prevent the value '\x7f' from being placed in the buffer.

Debugging

password_buf tracing with gef

According to the gdb manual, a watchpoint stops the execution of the program whenever the value of an expression changes. Let’s add one to trace the changes on the value of the first byte of the password_buf.

In order to have a clean output in gdb and to automate our tracing as much as possible, we can use gdb-gef with a command file containing specific configurations to be applied to our debugging session:

# configure the output layout.
gef config context.layout "trace memory"
gef config context.clear_screen 0

# use software watchpoint.
set can-use-hw-watchpoints 0

# start the program (and load symbols).
start 0x1234

# set a watchpoint for the first buffer byte value.
memory watch &buf 0x1 byte
watch * (char *) &buf

# remove useless output.
commands
    silent
    continue
end

# run the program and quit.
continue
quit

Output:

gef watchpoint

Okay, there seem to be some operations on the first byte of the buffer, let’s break them down:

  • the program starts by copying the value of argv[1] into password_buf using the strncpy function
  • then the program calls the fread function to read 4 bytes from its own file

Well, that’s what it’s supposed to do… Instead, there are recursive calls to a strange __do_global_init function following the call to the fread function which was supposed to call _IO_sgetn function:

fread calling sgetn

io_sgetn

vtable corruption

Reading some documentation, we learn that functions associated with a FILE structure pointer are referenced in a jump table (or vtable) called _IO_file_jumps which is itself referenced in a structure called _IO_FILE_plus which contains both the _IO_FILE structure (an alias of the FILE structure) and a pointer to the vtable (source). Since this table is common to several FILE structures (sources: here and here), it can be modified to hook a function or to hide code.

Let’s inspect our current vtable:

file vtable

Looking closely at each vtable entry, we notice that the pointer associated with the xsgetn function seems to have been replaced to point to the __do_global_init function, which explains our previous backtraces.

According to the documentation (here), the function xsgetn is the one which is actually in charge of reading n bytes following the call to fread and returns the number of characters actually read. That should be interesting!

__do_global_init function analysis

When displaying this function in IDA, it seems obvious that there is a problem:

__do_global_init anti disassembly

Even if we try to fix the function boundaries (using the p and e shortcuts), the IDA disassembly process doesn’t work properly and shows us both data and code:

__do_global_init fixed bounds

There are two common disassembly techniques:

  • flow-oriented disassembler: disassembles all bytes that are part of the execution flow
  • linear disassembler: iterates over a block of code, disassembling one instruction at a time. The size of the last disassembled instruction is used to determine the offset of the next instruction.

Here, it appears that the challenge author took advantage of the choice of most disassemblers to disassemble the bytes immediately following the call instruction before processing the call target, which can produce conflicting code and trick the disassembler into producing erroneous and inaccurate results if we’re using information to which the disassembler doesn’t have access to, in our case, the return address pointer.

The call instruction pushes a return address pointer on the stack. When the function will be analyzed, the disassembler will prematurely terminate the function because of the “rogue ret” instruction.

Let’s analyze each of these instructions:

  • call .+5 (\xe8\x00\x00\x00\x00): this instruction calls the location immediately following itself, which is equivalent to the following instruction set:
push [rip+5]
ret
  • add dword ptr [rsp], 0x9 (\x83\x04\x24\x09): this instruction adds 9 to the return address pointer
  • ret (\xc3): this instruction pops the return address from the stack and jumps to it
  • \x48\xd8\xfe\xca: these bytes following the ret instruction are not valid instructions and will never be executed, but they were analyzed and defined as data when the first call instruction were determined to not be part of any function due to the “rogue ret” instruction

To fix the function flow, we can replace these instructions with nop instructions and reset the function boundaries to cover the actual function instructions. I wrote an IDA python script to automate this task:

import idc
import idaapi
import idautils

idc.ida_expr.compile_idc_text('static fix_disas_shortcut() { exec_python("fix_anti_disas()"); }')

idc.add_idc_hotkey('ctrl-<', 'fix_disas_shortcut')

def fix_anti_disas():
    """
    Basically search for the following anti-disassembly instruction set, nop it and fix function boundaries:

        call .+5
        add [rsp], val
        ret
        .db XX
        .db XX
        .db XX

    """
    # get function from cursor position.
    cursor_ea = idc.get_screen_ea()
    if idautils.ida_funcs.get_func(cursor_ea):
        # get current function boundaries.
        func_start_ea = idautils.ida_funcs.get_func(cursor_ea).start_ea
        func_end_ea = idautils.ida_funcs.get_func(cursor_ea).end_ea

        # loop through all instructions in the function.
        insn_ea = func_start_ea
        while insn_ea <= func_end_ea:
            print(f'Current ea: {insn_ea:#x}')

            # make sure we're working with instuction (even if we overlap with existing one).
            if not idc.create_insn(insn_ea):
                for ea in range(insn_ea, idc.next_head(insn_ea) + 1):
                    idc.ida_bytes.del_items(ea, 0, 1)
                idc.create_insn(insn_ea)

            # get mnemonic of the current instruction (e.g., call, add, jmp).
            insn_mnem = idautils.DecodeInstruction(insn_ea).get_canon_mnem()
            # get operands of the current instruction (e.g., registers, value, address).
            insn_ops = idautils.DecodeInstruction(insn_ea).ops

            if insn_mnem == 'call':
                print(f'Found a call at: {insn_ea:#x}')
                called_ea = insn_ops[0].addr

                # get the immediate next instruction address and operands.
                im_next_insn_ea = insn_ea + idc.get_item_size(insn_ea)
                im_next_insn_mnem = idautils.DecodeInstruction(im_next_insn_ea).get_canon_mnem()
                im_next_insn_op1 = idaapi.get_reg_name(idc.get_operand_value(im_next_insn_ea, 0), 8)
                im_next_insn_op2 = idc.get_operand_value(im_next_insn_ea, 1)

                if called_ea == im_next_insn_ea \
                    and im_next_insn_mnem == 'add' \
                    and im_next_insn_op1 == 'rsp':
                    print(f'Found anti-disas at {insn_ea:#x}')

                    # get the real instruction address.
                    im_next_insn_ea += im_next_insn_op2
                    print(f'Real next instruction {im_next_insn_ea:#x}')

                    # nop out intermediate bytes and convert them into code.
                    for ea in range(insn_ea, im_next_insn_ea):
                        idc.ida_bytes.patch_byte(ea, 0x90)
                        idc.create_insn(ea)
                        print(f'Nopped {ea:#x}')

                    insn_ea = im_next_insn_ea
                else:
                    insn_ea = insn_ea + idc.get_item_size(insn_ea)
            elif insn_mnem == 'retn':
                print(f'End at {insn_ea:#x}')
                break
            else:
                insn_ea = insn_ea + idc.get_item_size(insn_ea)

            # if necessary, change the function boundaries to cover the next instruction.
            if not idautils.ida_funcs.get_func(insn_ea):
                func_end_ea = insn_ea + idc.get_item_size(insn_ea)
                idaapi.set_func_end(func_start_ea, func_end_ea)
                print(f'New function boundaries: [{func_start_ea:#x} - {func_end_ea:#x}]')
    else:
        print(f'No function at {cursor_ea:#x}')

To apply the script to the __do_global_init function, we just move our cursor into the function and press the ctrl-< keys.

Here is the simplified result (without nops):

__do_global_init fixed

And, here is the corresponding pseudocode:

xsgetn_hook pseudocode

xor_data pseudocode

Except for the use of an undocumented calling convention and the combination of calls and jumps to the same code block, this function is pretty simple to understand:

  • the password_buf (passed though the data char pointer) is xored with '\xa1'
  • if the encoded password_buf contains the word 0x85acaba2 at offset 8 (which is “getf” xored with '\xa1'), we replace the password_buf content with the decoded flag format string (DGA{VT4bl3_h0ok_%x!})
  • else, we replace the password_buf content with the ELF file magic bytes (\x7fELF)

It’s worth noting that the returned size is not the same in both cases:

  • if the password_buf contains getf, we return a fake size of 0xa8 which will gives us 0x2a at the end of fread (source)
  • else, we return 0x10, which will gives us 0x4 at the end of fread

Actual challenge solving

We can reuse the previous script and change the buffer bytes and rounds count before entering the encryption loop:

# python3 solve.py 
Password: 0x934cc553
# ./chall 934cc553getf  # make sure we have getf at offset 8.
Valid password found!
Trusted data: DGA{VT4bl3_h0ok_934cc553!}
Congrats!

Flag

The final flag is: DGA{VT4bl3_h0ok_934cc553!}

Happy Hacking!

Creased