12.16 Countering Disassembly
12.16.1 Problem
An object file disassembler can
produce an assembly language version of a binary, which can then be
used to understand and possibly modify the binary.
12.16.2 Solution
Anti-disassembly tricks are useful in frustrating automatic analysis,
but they generally will not hold up to a human review of the
disassembly. Make sure to combine the methods presented in the
discussion with data or code obfuscation techniques.
12.16.3 Discussion
Many disassemblers assume that long runs of
NULL bytes are data, although some will
continue to disassemble regardless. In the Intel instruction set,
0x00 is the opcode for add al,
[eax]—a valid instruction. The following macros use
NULL bytes to increment the eax
register by pushing eax, loading the address of
the pushed value into eax, and executing
add al, [eax] instructions as many times as the
user specifies.
#define NULLPAD_START asm volatile ( \
"pushl %eax \n" \
"movl %esp, %eax\n")
#define NULLPAD asm volatile ("addb %al, (%eax)\n")
#define NULLPAD_END asm volatile ("popl %eax\n")
#define NULLPAD_10 NULLPAD_START; \
NULLPAD; NULLPAD; NULLPAD; NULLPAD; NULLPAD; \
NULLPAD_END
This is particularly effective if the value referenced by
eax—that is, the value at the top of the
stack—is used later in the program. Note that many
disassemblers that ignore runs of NULL bytes allow
the user to override this behavior.
To demonstrate the effect this macro has on disassemblers, the
following source code was compiled and disassembled:
void my_func(void) {
int x;
NULLPAD_10;
for (x = 0; x < 10; x++) printf("%x\n", x);
}
DataRescue's
IDA Pro disassembler creates a code/data boundary at the start of the
NULL bytes, and completely ignores the
instructions that follow:
08048374 my_func:
08048374 55 push ebp
08048375 89 E5 mov ebp, esp
08048377 83 EC 08 sub esp, 8
0804837A 50 push eax
0804837B 89 E0 mov eax, esp
0804837B ; ------------------------------------------------------------------
0804837D 00 db 0 ;
0804837E 00 db 0 ;
0804837F 00 db 0 ;
08048380 00 db 0 ;
08048381 00 db 0 ;
08048382 00 db 0 ;
08048383 00 db 0 ;
08048384 00 db 0 ;
08048385 00 db 0 ;
08048386 00 db 0 ;
08048387 58 db 58h ; X
08048388 C7 db 0C7h ; +
08048389 45 db 45h ; E
0804838A FC db 0FCh ; n
0804838B 00 db 0 ;
0804838C 00 db 0 ;
0804838D 00 db 0 ;
The GNU
objdump utility ignores the
NULL bytes, though the rest of the disassembly was
not affected:
08048374 <my_func>:
8048374: 55 push %ebp
8048375: 89 e5 mov %esp,%ebp
8048377: 83 ec 08 sub $0x8,%esp
804837a: 50 push %eax
804837b: 89 e0 mov %esp,%eax
...
8048385: 00 00 add %al,(%eax)
8048387: 58 pop %eax
8048388: c7 45 fc 00 00 00 00 movl $0x0,0xfffffffc(%ebp)
804838f: 83 7d fc 09 cmpl $0x9,0xfffffffc(%ebp)
8048393: 7e 02 jle 8048397 <my_func2+0x23>
8048395: eb 1a jmp 80483b1 <my_func2+0x3d>
Most disassemblers can be fooled by a simple
misalignment error—for example, jumping into the middle of an
instruction so that the target of the jump is disassembled
incorrectly. The typical technique of performing an unconditional
jump into another instruction is not very effective with
disassemblers that follow the flow of execution—the jump will
be followed, and the bytes between the jump and the jump target will
be ignored. Instead, you can use a conditional jump, followed by the
first byte of a multibyte instruction (0x0F is
ideal for this, because it is the first byte of all two-byte
opcodes); this way, a flow-of-execution disassembler will disassemble
the code after the conditional
branch.
#define DISASM_MISALIGN asm volatile ( \
" pushl %eax \n" \
" cmpl %eax, %eax \n" \
" jz 0f \n" \
" .byte 0x0F \n" \
"0: \n" \
" popl %eax \n")
This macro compares the eax register to itself,
forcing a true condition; the jz instruction is
therefore always followed during execution. A disassembler will
either ignore the jz instruction and interpret the
0x0F byte that follows as an instruction, or it
will follow the jz instruction. If the
jz instruction is followed, the disassembler can
still interpret the code incorrectly if the address after the
jz instruction is disassembled before the address
to which the jz instruction jumps. For example:
void my_func(void) {
int x;
DISASM_MISALIGN;
for (x = 0; x < 10; x++) printf("%x\n", x);
}
IDA Pro disassembles the code after the
jz instruction at address 0804837D before
following the jump itself, resulting in an incorrect disassembly:
08048374 my_func:
08048374 55 push ebp
08048375 89 E5 mov ebp, esp
08048377 83 EC 08 sub esp, 8
0804837A 50 push eax
0804837B 39 C0 cmp eax, eax
0804837D 74 01 jz short near ptr loc_804837F+1
0804837F
0804837F loc_804837F: ; CODE XREF: .text:0804837D#j
0804837F 0F 58 C7 addps xmm0, xmm7
08048382 45 inc ebp
08048383 FC cld
08048383 ; --------------------------------------------------------------------
08048384 00 db 0 ;
08048385 00 db 0 ;
08048386 00 db 0 ;
08048387 00 db 0 ;
08048388 83 db 83h ; â
08048389 7D db 7Dh ; }
0804838A FC db 0FCh ; n
The GNU objdump
disassembler does not follow the jump at all and encounters the same
problem:
08048374 <my_func2>:
8048374: 55 push %ebp
8048375: 89 e5 mov %esp,%ebp
8048377: 83 ec 08 sub $0x8,%esp
804837a: 50 push %eax
804837b: 39 c0 cmp %eax,%eax
804837d: 74 01 je 8048380 <my_func2+0xc>
804837f: 0f 58 c7 addps %xmm7,%xmm0
8048382: 45 inc %ebp
8048383: fc cld
8048384: 00 00 add %al,(%eax)
8048386: 00 00 add %al,(%eax)
8048388: 83 7d fc 09 cmpl $0x9,0xfffffffc(%ebp)
Sophisticated disassemblers attempt
to reconstruct as much as possible of the original source code of the
binary. One of the tasks they perform towards this goal is the
recognition of functions within the binary. Because the end of a
function is generally assumed to be the first return instruction
encountered, it is possible to truncate a function within the
disassembler by providing a false return. The following macro will
return to a byte after the ret instruction,
causing the definition of the function to end
prematurely:
#define DISASM_FALSERET asm volatile ( \
" pushl %ecx /* save registers */\n" \
" pushl %ebx \n" \
" pushl %edx \n" \
" movl %esp, %ebx /* save ebp, esp */\n" \
" movl %ebp, %esp \n" \
" popl %ebp /* save old %ebp */\n" \
" popl %ecx /* save return addr */\n" \
" lea 0f, %edx /* edx = addr of 0: */\n" \
" pushl %edx /* return addr = edx */\n" \
" ret \n" \
" .byte 0x0F /* off-by-one byte */\n" \
"0: \n" \
" pushl %ecx /* restore ret addr */\n" \
" pushl %ebp /* restore old &ebp */\n" \
" movl %esp, %ebp /* restore ebp, esp */\n" \
" movl %ebx, %esp \n" \
" popl %ebx \n" \
" popl %ecx \n")
The first three pushl instructions and the last
three popl instructions save and restore the
registers that will be used in the course of the false return. The
current stack pointer is saved in the ebx
register, and the current stack pointer is set to the frame pointer
(ebp) of the current function—this places
the frame pointer of the calling function at the top of the stack.
The saved frame pointer is moved into the ebp
register, and the return address is moved into the
ecx register so that these values can be preserved
across the return. The instruction movl 0f, %edx
stores the address of the local code label 0: in
the edx register. This address is then pushed onto
the stack, where it becomes the new return address. The following
ret instruction causes the program to jump to code
label 0:, where the execution context of the
function (the stack and frame pointers, saved frame pointer, and
return address) is restored to its original state.
When a disassembler follows the control flow of the program, rather
than blindly disassembling instructions from the start of the code
segment, it will encounter the false return statement and will stop
disassembly of the current function. As a result, any instructions
after the false return will not be disassembled, and they will appear
as data located in the code segment.
void my_func(void) {
int x;
for (x = 0; x < 10; x++) printf("%x\n", x);
DISASM_FALSERET;
/* other stuff can be done here that won't be disassembled */
}
This produces the following disassembly in
IDA Pro:
08048357 51 push ecx
08048358 53 push ebx
08048359 52 push edx
0804835A 89 E3 mov ebx, esp
0804835C 89 EC mov esp, ebp
0804835E 5D pop ebp
0804835F 59 pop ecx
08048360 8D 15 69 83 04 08 lea edx, ds:dword_8048369
08048366 52 push edx
08048367 C3 retn
08048367 my_func endp ; sp = -0Ch
08048367
08048367 ;----------------------------------------------------------------
08048368 0F db 0Fh ;
08048369 51 55 89 E5 dword_8048369 dd 0E5895551h
08048369 ; DATA XREF: my_func+38#r
0804836D 89 db 89h ; ë
0804836E DC db 0DCh ; ?
0804836F 5A db 5Ah ; Z
08048370 5B db 5Bh ; [
08048371 59 db 59h ; Y
08048372 C9 db 0C9h ; +
08048373 C3 db 0C3h ; +
The false return at address 08048367 ends the function, with the
subsequent code not being disassembled. The XREF at address 08048369,
however, clearly indicates that something strange is going on, even
though the disassembly is incorrect. There is also an indication of a
stack error at the endp directive. A cracker can
simply examine the instruction making the reference, in this case
push edx at address 08048366, to realize that the
return address is being overwritten.
A disassembler that does not follow the control flow will be not be
affected by the false return trick, as the following output from
objdump
demonstrates:
8048357: 51 push %ecx
8048359: 52 push %edx
8048358: 53 push %ebx
804835a: 89 e3 mov %esp,%ebx
804835c: 89 ec mov %ebp,%esp
804835e: 5d pop %ebp
804835f: 59 pop %ecx
8048360: 8D 15 69 83 04 08 lea 0x8048369,%edx
8048366: 52 push %edx
8048367: c3 ret
8048368: 0f 51 55 89 sqrtps 0xffffff89(%ebp),%xmm2
804836c: e5 89 in $0x89,%eax
804836e: dc 5a 5b fcompl 0x5b(%edx)
8048371: 59 pop %ecx
8048372: c9 leave
8048373: c3 ret
The false return at address 08048367 does not affect the subsequent
disassembly, although the misalignment trick at address 08048368 does
cause the next three instructions to be disassembled incorrectly.
This provides an example of how two simple techniques can be combined
to create an inaccurate disassembly in different types of
disassemblers.
|