 # Practical Reverse Engineering Solutions – Page 17

my go at the exercises on page 17

This blog post presents my solutions to exercises from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.

## Exercise 1

Given what you learned about `CALL` and `RET`, explain how you would read the value of `EIP`? Why can’t you just do `MOV EAX, EIP`?

`MOV EAX, EIP` does not work, because `EIP` not an ordinary register. There is no real need to read the `EIP`, as is handled for you by the processor.

The `CALL` instruction places the `EIP` register onto the stack before jumping to the function address. So the stack entering the function looks like that: We can therefore get the value of `EIP` by jumping to a dummy function `read_eip` (thereby placing `EIP` at the top of the stack), and then copying the value from the stack memory to a register, i.e., `EAX`:

```SECTION  .data
SECTION  .text
GLOBAL _start
_start:
nop
mov  ebx,0
mov  eax,1
int 080h

mov eax, [esp]
ret
```

Let’s test the code with gdb. The value of `EIP` before calling `read_eip` is `0x8048061`:

```\$ nasm -f elf32 -g -F dwarf code.asm
\$ ld -m elf_i386 -o code code.o
phreak@phreak:exercise 1]\$ gdb -q code
(gdb) set disassemble-next-line on
(gdb) break *_start
Breakpoint 1 at 0x8048060: file code.asm, line 5.
(gdb) run
Starting program: /home/jb/pre/chapter_1/page_17/exercise_1/code

Breakpoint 1, _start () at code.asm:5
5	    nop
=> 0x08048060 <_start+0>:	90	nop
(gdb) s
=> 0x08048061 <_start+1>:	e8 0c 00 00 00	call   0x8048072 <read_eip>
(gdb) p/x \$eip
\$1 = 0x8048061
```

If we inspect `EAX` right after the function call we get the value `0x8048066`; which now is also the value of `EIP`.

```(gdb) s
_start () at code.asm:7
7	    mov  ebx,0
=> 0x08048066 <_start+6>:	bb 00 00 00 00	mov    \$0x0,%ebx
(gdb) p/x \$eax
\$3 = 0x8048066
(gdb) p/x \$eip
\$3 = 0x8048066
```

So in fact we get the `EIP` after the `CALL`, which is 5 bytes (the number of bytes for the instruction code `CALL`) greater than before the `CALL`.

## Exercise 2

Come up with at least two code sequences to set `EIP` to 0xAABBCCDD

I know three instructions that manipulate the `EIP`:

1. `RET`
2. `JMP`
3. `CALL`

### Version 1 – Based on RET

The instruction `RET` jumps to the address stored at the top of the stack, i.e., sets the `EIP` to the double word stored at `ESP`. So by pushing the desired address on the stack, followed by `RET`, should set the `EIP`:

```SECTION  .data
SECTION  .text
GLOBAL _start
_start:
nop
push 0AABBCCDDh
ret
```

We can check with the GNU debugger:

```(gdb) s
6	    push 0AABBCCDDh
(gdb) p/x \$eip
\$1 = 0x8048061
(gdb) s
_start () at version_1.asm:7
7	    ret
(gdb) s
0xaabbccdd in ?? ()
(gdb) p/x \$eip
\$2 = 0xaabbccdd
```

### Version 2 – Based on JMP

Instead of pushing the address on the stack and using `RET` to jump to an address, doing a plain `JMP` also works:

```SECTION  .data
SECTION  .text
GLOBAL _start
_start:
nop
jmp 0AABBCCDDh
```

Again let’s check with the GNU debugger:

```(gdb) s
6	    jmp 0AABBCCDDh
(gdb) p/x \$eip
\$1 = 0x8048061
(gdb) s
0xaabbccdd in ?? ()
(gdb) p/x \$eip
\$2 = 0xaabbccdd
```

### Version 3 – Based on CALL

`CALL` works similar to `JMP` (compared to version 2 it does an unnecessary push of the `EIP` to the stack):

```SECTION  .data
SECTION  .text
GLOBAL _start
_start:
nop
call 0AABBCCDDh
```

In GNU debugger:

```(gdb) s
6	    call 0AABBCCDDh
(gdb) p/x \$eip
\$1 = 0x8048061
(gdb) s
0xaabbccdd in ?? ()
(gdb) p/x \$eip
\$2 = 0xaabbccdd
```

## Exercise 3

In the example function, `addme`, what would happen if the stack pointer were not properly restored before executing `RET`?

You can see the `addme` function below, with the referenced instruction highlighted:

```SECTION  .data
SECTION  .text
GLOBAL _start
_start:
nop
mov eax, 7
mov ecx, 5
_before:
push eax
push ecx
_after:
mov  ebx,0
mov  eax,1
int 080h

push ebp
mov ebp, esp
movsx eax, word [ebp+8]
movsx eax, word [ebp+0Ch]
mov esp, ebp
pop ebp
retn```

The restore is part of the function epilogue, which is standard for C-style functions. Resetting the `ESP` ensures that any values placed on the stack whithin the function, but not cleaned up, don’t mess with the `RET` statement. If, for instance, the function would have pushed a value on the stack but never retrieve it, then the `RET` instruction would jump to this location instead of the `EIP`. Restoring the `ESP` prevents this. But if the function properly cleans the stack there is no need to backup and restore the `ESP`. In the present `add_me` function there are not instruction that modify the `ESP` between the prologue and epilogue. So there is no need to restore the `ESP`, removing the instruction will have no effect.

Here’s validation with the GNU debugger, first with the restore instruction:

```\$ gdb -q addme_with_restore
(gdb) break *_before
Breakpoint 1 at 0x804806b: file addme_with_restore.asm, line 9.
(gdb) break *_after
Breakpoint 2 at 0x8048075: file addme_with_restore.asm, line 14.
(gdb) run

Breakpoint 1, _before () at addme_with_restore.asm:9
9	    push eax
(gdb) p/x \$esp
\$1 = 0xffffd000
(gdb) c
Continuing.

Breakpoint 2, _after () at addme_with_restore.asm:14
14	    mov  ebx,0
(gdb) p/x \$esp
\$2 = 0xffffd000
```

and the same without the restore instruction:

```Breakpoint 1, _before () at addme_without_restore.asm:9
9	    push eax
(gdb) p/x \$esp
\$1 = 0xffffd000
(gdb) c
Continuing.

Breakpoint 2, _after () at addme_without_restore.asm:14
14	    mov  ebx,0
(gdb) p/x \$esp
\$2 = 0xffffd000
```

## Exercise 4

In all of the calling conventions explained, the return value is stored in a 32-bit register (`EAX`). What happens when the return value does not fit in a 32-bit register? Write a program to experiment and evaluate your answer. Does the mechanism change from compiler to compiler?

I use the following C code:

```#include <stdio.h>

struct data
{
int n1;
int n2;
};

struct data test_return(void) {
struct data test_object;
test_object.n1 = 7;
test_object.n2 = 5;
return test_object;
}

int main (int argc, char *argv[] )
{
struct data ret;
ret = test_return();
int res = (ret.n1 + ret.n2);
return res;
}
```

The `struct` contains two integer values and should therefore be bigger than 32bit. I use `gcc` to compile the code:

`gcc -fno-asynchronous-unwind-tables -masm=intel -Os -S -m32 code.c`

The full output is on GitHub, here’s the function excerpt:

```test_return:
push    ebp
mov     ebp, esp
mov     eax, DWORD PTR [ebp+8]
mov     DWORD PTR [eax], 7
mov     DWORD PTR [eax+4], 5
pop     ebp
ret     4```
• Line 2 and 3 are part of the standard function prologue.
• Line 3 gets the value from stack `[EBP + 8]`.
• Line 4 and 5 store the values 5,7 at the location referenced by `EAX`, i.e., `[EBP+8]`.
• Line 6 and 7 are the function epilogue.

The return value is placed in memory at a location given by the stack `[EBP+8]`. So in order to use the function, the caller needs to reserve space for the struct in memory, and push the address onto the stack before calling the function. Compiling the c code with `-Os` flag produces assembly code where the function is never called (since the return value is always 12). To see the call I recompiled the code with `-O0`. The function now contains unnecessary `mov` statements, but in essence is the same (see GitHub for full output). The main function now does call the function:

```main:
push	ebp
mov	ebp, esp
sub	esp, 20
lea	eax, [ebp-8]
mov	DWORD PTR [esp], eax
call	test_return```

In Line 4 the call `sub esp, 20` reserves 20 bytes on stack. The next two instructions get the address of `[EBP-8]`, and put the value on the stack. The following images shows how the stack changes for the six lines above: The value at the top of the stack contains the address of the stack memory at `ESP+4`. The stack before the function epilogue, i.e., after `mov DWORD PTR [eax+4], 5` looks like the right hand side of the above image. `EAX` contains the value of the memory at `[EBP+8]`, and therefore contains the address of the stack at `EBP+12`. The function places the member `n1` of the struct at `EAX` (= `EBP+12`) and the member `n2` at `EAX+4` (= `EBP+16`).

So long story short, the function places its return value on the stack and returns the address of the stack location to the caller. The caller has to reserve the necessary space on the stack and has to pass the address to that reserved space to the function (doesn’t therefore need to check the return value, the caller knows the address already).

I got very similar results with Clang. Again the caller reserves space for the structure and moves the address to the free space last on the stack (`lea edx, dword ptr [ebp - 32]`, and `mov dword ptr [esp], edx`):

```sub	esp, 40
mov	eax, dword ptr [ebp + 12]
mov	ecx, dword ptr [ebp + 8]
lea	edx, dword ptr [ebp - 32]
mov	dword ptr [ebp - 4], 0
mov	dword ptr [ebp - 8], ecx
mov	dword ptr [ebp - 12], eax
mov	dword ptr [esp], edx
call	test_return```

Clang moves more stuff on the stack, but that’s probably a matter of optimization. The function looks almost the same as for GCC:

```push	ebp
mov	ebp, esp
sub	esp, 8
mov	eax, dword ptr [ebp + 8]
mov	dword ptr [ebp - 8], 7
mov	dword ptr [ebp - 4], 5
movsd	xmm0, qword ptr [ebp - 8]
movsd	qword ptr [eax], xmm0
pop	ebp
ret	4
```

Instead of moving to stack space below `EBP` (i.e., at higher addresses), Clang moves the data above the `EBP` (at lower addresses). The function doesn’t use the pointer passed by the caller, but reserve the space within the function doing `sub esp, 8` in line 3.

Note: I removed the Disqus integration in an effort to cut down on bloat. The following comments were retrieved with the export functionality of Disqus. If you have comments, please reach out to me by Twitter or email.

Apr 09, 2019 23:55:38 UTC

Very cool, thanks.

May 13, 2019 15:08:35 UTC

In exercise 4, in the image where you show two versions of the stack - in the left image, shouldn't there be 7 empty rectangles between ESP and EBP, as each rectangle represents 4 bytes (ESP pointing to the eighth and final one)?
Genuinely asking as it would help me to be sure that I understood the whole shebang.

Thank you very much! Your work is amazing and very appreciated!

May 13, 2019 15:57:14 UTC

Hi,
You are right, the stack image makes no sense in this context. I moved it up to the corresponding listing and created a new image for the 6 lines.