# Practical Reverse Engineering Solutions – Page 11my go at exercise 1 on page 11

This blog post presents my solutions to exercises from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.

## Problem Statement

This function uses a combination `SCAS` and `STOS` to do its work. First, explain what is the type of the `[EBP+8]` and `[EBP+C]` in line 1 and 8, respectively. Next, explain what this snippet does:

```01: 8B 7D 08    mov edi, [ebp+8]
02: 8B D7       mov edx, edi
03: 33 C0       xor eax, eax
04: 83 C9 FF    or ecx, 0FFFFFFFFh
05: F2 AE       repne scasb
06: 83 C1 02    add ecx, 2
07: F7 D9       neg ecx
08: 8A 45 0C    mov al, [ebp+0Ch]
09: 8B AA       mov edi, edx
10: F3 AA       rep stosb
11: 8B C2       mov eax, edx```

## Context of the Snippet

The function snippet probably get’s its parameters in C style. This convention places the function parameter on the stack before the call is made. The parameters are placed in reverse order from the prototype of the function, i.e., the last parameter is placed first. The `CALL` then places the instruction pointer `EIP` on the stack. Finally, the standard function prologue pushes the base pointer on the stack and sets the value of `EBP` to the stack pointer `ESP`. This leads to the following stack image before line 1 of the exercise snippet is executed (see left hand side): • In the following analysis we see that `[EBP+8]` (the first function parameter) is of type `char *`, i.e., a pointer to a sequence of bytes. The function snippet requires that sequence is delimited by zero, so it probably is a null-terminated string.
• The value at `[EBP+C]` (the second function parameter) is of type `char `, i.e., a single Byte like a letter.

I’m using the string “The pool on the roof must have a leak.” (with null byte at the end) as argument 1 at `[EBP+8]` and character `'x'` for the second parameter at `[EBP+12]`. See the right stack in the above figure. Note that while `'x'` is actually placed at `EBP+C`, the frame at `EBP+8` contains a memory address pointing to the first letter of the string.

To check my guesses of what the code snippet does, I put the function prologue and epilogue around it and added a caller to get a fully functional assembly code (GitHub link):

```SECTION  .data
my_str:
db     'The pool on the roof must have a leak.', 0
SECTION  .text
GLOBAL _start
_start:
nop
push byte 'x'      ; second function parameter
push dword my_str  ; first function parameter
call black_out     ; call function
add esp, 8         ; cleaning out the stack
mov  ebx,0         ; parameter for exit call (return value)
mov  eax,1         ; exit system call
int 080h           ; run system call, see page 79 pal

black_out:
push ebp           ; function prologue, save stack base pointer
mov ebp, esp       ; point base pointer to ESP
; ------------ start code from book ---------
mov edi, [ebp+8]
mov edx, edi
xor eax, eax
or ecx, 0FFFFFFFFh
repne scasb
add ecx, 2
neg ecx
mov al, [ebp+0Ch]
mov edi, edx
rep stosb
mov eax, edx
; ------------ end code from book -----------
mov esp, ebp       ; restore stack pointer
pop ebp            ; restore stack base pointer
ret
```

I compiled the code on a 64bit machine with:

```\$ nasm -f elf32 -g -F dwarf code.asm
\$ ld -m elf_i386 -o code code.o
```

and started debugging with:

```\$ gdb -q code
Reading symbols from code...done.
(gdb) break *_start
Breakpoint 1 at 0x8048080: file code.asm, line 7.
(gdb) run
Starting program: /home/jb/pre/chapter_1/page_11/exercise_1/code

Breakpoint 1, _start () at code.asm:7
7	    nop
```

The caller first pushes the second function parameters `'x'` on the stack:

```(gdb) s
8	    push byte 'x'
(gdb) s
9	    push dword my_str
(gdb) x/cb \$esp
0xffffcfec:	120 'x'
```

Then it pushes the first parameter `"The pool on the roof must have a leak."`:

```(gdb) s
10	    call black_out
```

In contrast to the second parameter, the stack value is a pointer to the string in memory. The command `x/xw \$esp` gives the value in memory referenced by `ESP`:

```(gdb) x/xw \$esp
0xffffcfe8:	0x080490c0
```

So the string is stored at `0x080490c0`:

```(gdb) x/s 0x080490c0
0x80490c0 <my_str>:	"The pool on the roof must have a leak."
```

The next three instructions call the function and run the function prologue:

```(gdb) s
17	    push ebp
(gdb) p/x \$ebp
\$1 = 0x0
(gdb) s
18	    mov ebp, esp
(gdb) p/x \$esp
\$1 = 0xffffcfe0
(gdb) s
black_out () at code.asm:20
20	    mov edi, [ebp+8]
```

After that we enter the snippet that is analyzed step-by-step in the next secion.

## Walk-Through

### ► Line 1: `mov edi, [ebp+8]`

As discussed before, `[ebp+8]` is a value in stack representing the first function parameter (see right hand side of stack image). This instruction copies the parameter, a pointer to the string, to register `EDI`. Now `EDI` references our string:

```(gdb) x/s \$edi
0x80490c0 <my_str>:	"The pool on the roof must have a leak."
```

### ► Line 2: `mov edx, edi`

This simply makes a copy of `EDI`. The reason for that will be clear in line 5. For reference, `EDI` and `EDX` contain the double word `0x80490c0`:

```(gdb) p/x \$edi
\$5 = 0x80490c0```

### ► Line 3: `xor eax, eax`

This sets the value of `EAX` to zero:

```(gdb) p/x \$eax
\$6 = 0x0```

Again, the purpose of this will be clear in line 5.

### ► Line 4: `or ecx, 0FFFFFFFFh`

This sets the value of `ECX` to `0xFFFFFFFF`:

```(gdb) p/x \$ecx
\$7 = 0xffffffff```

We interpret `ECX` as a signed integer `-1`:

```(gdb) p/d \$ecx
\$7 = -1```

The register `ECX` is used in the next instruction.

### ► Line 5: `repne scasb`

Line 5 is where a lot of the magic happens. The instruction `scasb` searches the memory for the byte in `EAX`, starting at `EDI`. The instruction decreases the value of `ECX` after each byte comparison by one, and increases the value of `EDI` by one.

In our example, we search the null byte (in `EAX`) in the null terminated string “The pool on the roof must have a leak.” (referenced by `EDI`). The counter `ECX` starts from -1. The following image illustrates the registers before and after `repne scasb`: So `ECX` ends up being `-40`

```(gdb) p/d \$ecx
\$8 = -40
```

The value of `EDI` changes too, that’s why in line 2 we made a copy of the value:

```(gdb) p/x \$edi
\$9 = 0x80490e7
```

(the start of the string is at `0x80490c0`).

### ► Line 6: `add ecx, 2`

Add 2 to `ECX` so `ECX` becomes -38:

```(gdb) p/d \$ecx
\$10 = -38
```

This corresponds to -1 times the length of the string. Adding two compensates for firstly not starting to count down from 0 (remember we started at -1), and secondly also counting the null byte.

### ► Line 7: `neg ecx`

This simply negates the value of `ECX`, so now it actually corresponds to the string length:

```(gdb) p/d \$ecx
\$11 = 38
```

To summarize: Up to and including line 7, the snippet actually calculates the length of the string passed at `[EBP+8]`.

### ► Line 8: `mov al, [ebp+0Ch]`

Starting with line 8, we enter the second part of the snippet. This instruction copies the byte at stack location `[EBP+8]` to register `AL`, i.e., the second function parameter. Since the second parameter is of type `char` – only one byte in size – the value fits in the lower 8 bits of the `EAX` register. `AL` now holds the character `'x'`:

```(gdb) p/c \$al
\$12 = 120 'x'```

### ► Line 9: `mov edi, edx`

The instruction following in line 10 again operates on `EDI`. Since line 5 modified the value and it no longer points to the start of the string, we restore it from the backup in `EDX` that we created in line 2. After that, `EDI` should once again point to the string:

```(gdb) p/x \$edi
\$13 = 0x80490c0```

(compare `0x80490c0` to the output in line 2).

### ► Line 10: `rep stosb`

Again a very powerful instruction. It copies the byte in `AL` (in our case the character `'x'`) to every byte in the sequence starting at `EDI` (in our case the string “The pool on the roof must have a leak.”). It does it exactly `ECX` times (so in our case for the entire length of the string). In other words, this instruction does a `memset`, effectively overwriting the entire string with a single character. After the instruction, the content of our string is blacked out by `'x'`s:

```(gdb) x/s \$edx
0x80490c0 <my_str>:	'x'```

(The instruction again modifies `EDI`, so you have to use `EDX` to reference the string.)

### ► Line 11: ` mov eax, edx`

This copies the address of the string to `EAX`. `EAX` holds the return value of the function, so the snippet returns a pointer to the modified string.

## C-Code

The walk-through demonstrated that the function is overwriting every character in the string passed as the first function parameter with a character passed as the second argument. Here’s a working C-Code, where the function `black_out` corresponds to the snippet in this exercise:

```#include <stdio.h>

char* black_out(char *str, char ch)
{
/* find length of string */
int len = 0;
char *str_cpy = str;
while (*str_cpy != '\0') {
len++;
str_cpy++;
}
/* set each character of string to <ch> */
while (len-- > 0) {
str[len] = ch;
}
return str;
}

int main (int argc, char *argv[] )
{
if (argc != 3 )
printf("usage: %s string character", argv);
else {
char *test2 = black_out(argv, *argv);
printf("%s\n", test2);
}
}
```

The function can be simplified by using the `strlen` and `memset` functions:

```char* black_out(char *str, char ch)
{
/* find length of string */
int len = strlen(str);
/* set each character of string to  */
memset(str, ch, len);
return str;
}
```

## Comments by Disqus

comments powered by Disqus