# Practical Reverse Engineering Solutions – Page 78 (Part III)my go at mystery7, mystery8 and mystery 9 on pages 78ff

This blog post presents my solution to exercises 7 to 9 on page 78ff from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.

For the code in each exercise, do the following in order (whenever possible):

• Determine whether it is in Thumb or ARM state.
• Explain each instruction’s semantic. If the instruction is `LDR/STR`, explain the addressing mode as well.
• Identify the types (width and signedness) for every possible object. For structures, recover field size, type, and friendly name whenever possible. Not all structure fields will be recoverable because the function may only access a few fields. For each type recovered, explain to yourself (or someone else) how you inferred it.
• Recover the function prototype.
• Identify the function prologue and epilogue.
• Explain what the function does and then write pseudo-code for it.
• Decompile the function back to C and give it a meaningful name.

## Mystery 7

Figure 2-13 illustrates a common routine, but you may not have seen it
implemented this way.

This is the code from Figure 2-13 (note that the listing in the book has a typo in line 13, `00 2B CMB R3, #0` should of course read `00 2B CMP R3, #0`):

```mystery7
02 46            MOV R2, R0
08 B9            CBNZ R0, loc_100E1D8
00 20            MOVS R0, #0
70 47            BX LR
loc_100E1D8
90 F9 00 30      LDRSB.W R3, [R0]
02 E0            B loc_100E1E4
loc_100E1DE
92 F9 00 30      LDRSB.W R3, [R2]
loc_100E1E4
00 2B            CMP R3, #0
FA D1            BNE loc_100E1DE
10 1A            SUBS R0, R2, R0
6F F3 9F 70      BFC.W R0, #0x1E, #2
70 47            BX LR
; End of function mystery7
```

### ARM or Thumb

The code is in Thumb state: The snippet uses 16bit and 32bit instructions and some instructions have the `.W` suffix.

### Instruction Semantic

The only non common instruction is `BFC.W R0, #0x1E, #2` which does a bit field clear. The instruction sets the two most significant bits to zero, so `0xFFFFFFFF` would become `0x3FFFFFFF`.

### Types

The function only takes one argument `arg1 = R0`. The loop in lines 9 to 14 iterate over bytes of `arg1` (`LDRSB.W` in line 11 accesses a single byte, and `ADDS R2, #1` in line 10 increments the array index by one byte). Furthermore, the loop ends if in line 13 an array element is . This indicates that `arg1` is a pointer to a null terminated string. The function returns an unsigned int.

### Function Prototype

The function prototype is:

`UNSIGNED INT mystery7(CHAR*);`

### Prologue and Epilogue

There is no prologue or epilogue. The function does not overwrite any registers except `R0`, `R2` and `R3`. It returns with `BX LR` which switches back to ARM state.

### Purpose and Pseudo-code

The function searches for the null byte in string `arg1`. Line 15 `SUBS R0, R2, R0` computes the difference between the address of the null byte and the address of the start of the string. This corresponds to the length of the string. The function implements strlen. I don’t understand the purpose of setting the two most significant bits of the difference to zero. Those bits shouldn’t be set in the first place for any reasonable strings.

```FUNCTION mystery7(str)
len = 0
WHILE str[len] != 0 DO
len = len + 1
ENDWHILE
RETURN len
END
```

### C-Code

```unsigned int strlen(const char *s) {
unsigned int len = 0;
while( s[len] != '\0' )
len++;
return len;
}
```

## Mystery 8

In Figure 2-14, byteArray is a 256-character array whose content is `byteArray[] = {0, 1, …, 0xff}.`

This is the code from Figure 2-14:

```mystery8
2D E9 78 48      PUSH.W {R3–R6,R11,LR}
0D F2 10 0B      ADDW R11, SP, #0x10
0C 4E            LDR R6, =byteArray
09 E0            B loc_100E34C
loc_100E338
05 78            LDRB R5, [R0]
01 3A            SUBS R2, #1
4D B1            CBZ R5, loc_100E352
0B 78            LDRB R3, [R1]
9C 5D            LDRB R4, [R3,R6]
AB 5D            LDRB R3, [R5,R6]
A3 42            CMP R3, R4
04 D1            BNE loc_100E352
loc_100E34C
00 2A            CMP R2, #0
F3 DC            BGT loc_100E338
01 3A            SUBS R2, #1
loc_100E352
00 2A            CMP R2, #0
01 DA            BGE loc_100E35A
00 20            MOVS R0, #0
04 E0            B locret_100E364
loc_100E35A
0B 78            LDRB R3, [R1]
9A 5D            LDRB R2, [R3,R6]
03 78            LDRB R3, [R0]
9B 5D            LDRB R3, [R3,R6]
98 1A            SUBS R0, R3, R2
locret_100E364
BD E8 78 88      POP.W {R3–R6,R11,PC}
; End of function mystery8
```

### ARM or Thumb

The code is in Thumb state:

• The snippet uses 16bit and 32bit instructions.
• Some instructions have the `.W` suffix.
• The `CBZ` instruction is only available in Thumb state.
• The code uses `PUSH.W` and `POP.W` as function prologue and epilogue

### Instruction Semantic

Nothing special, except maybe `LDR R6, =byteArray` which is a pseudo-instruction that sets `R6` to point to the array `{0,1,2,...,255}`

### Types

The function takes three arguments. `arg1 = R0` and `arg2 = R1` are both used in an array fashion: Lines 6 to 19 iterate over bytes of those to arrays. The code also compares elements of `arg1` to `arg2`, so the two parameters are probably of the same type. In line 9 there’s a check if the elements in `arg1` are , so `arg1` and `arg2` are null terminated strings.

The third parameter `arg3` acts as a limit counter (see lines 8 and 18/19). So the type is probably an unsigned int (or any other unsigned integer type).

### Function Prototype

The function prototype is:

`UNSIGNED INT mystery8(CHAR*, CHAR*, UNSIGNED INT);`

### Prologue and Epilogue

Lines 2 and 33 preserve registers `R3-R6` and `R11`. Apart from the three function arguments `R0` to `R2` the functions doesn’t write any other registers. The `PUSH` and `POP` also save and jump to the return address respectively.

### Purpose and Pseudo-code

If found it easiest to start with the loop in line 6 to line 19. Here’s an almost one to one translation to pseudo-code:

```FUNCTION mystery8(string1, string2, limit)
byteArray = {0, 1, 2, ..., 255}
index = 0
DO
IF limit <= 0 THEN
// break to line 20

R5 = string1[index]
limit = limit -1
IF R5 == 0 THEN
// break to line 21

R4 = byteArray[string2[index]]
R3 = byteArray[string1[index]]
index = index + 1
WHILE R3 != R4
index = index - 1

// line 21
```

Line 20 just decrements `limit`, we can eliminate the instruction by moving `limit = limit-1` up and changing to a strict check `limit < 0`.

Starting with line 21 the snippet checks if the limit is not yet zero (meaning the code did take the second `BREAK`). If this is the case, the code returns a difference based on first array elements where there was a difference.

If the limit is zero, then the first `BREAK` must have been taken and the code returns .

```FUNCTION mystery8(string1, string2, limit)
byteArray = {0, 1, 2, ..., 255}
index = 0
DO
limit = limit -1
IF limit < 0 THEN
BREAK

R5 = string1[index]
IF R5 == 0 THEN
BREAK

R4 = byteArray[string2[index]]
R3 = byteArray[string1[index]]
index = index + 1
WHILE R3 != R4

// line 21
index = index - 1
IF limit >= 0 THEN
R2 = byteArray[string2[index]]
R3 = byteArray[string1[index]]
RETURN R3 - R2
ELSE
RETURN 0```

The `byteArray` is `{0,1,...,255}`, so for all bytes `n` we have `byteArray[n] = n`. We can further simplify the code by moving the last `IF/ELSE` construct inside the loop:

```FUNCTION mystery8(string1, string2, limit)
FOR index = 0 to limit - 1 DO
IF string1[index] == 0 OR string1[index] != string2[index] THEN
RETURN string1[index] - string2[index]
ENDIF
ENDFOR
RETURN 0
END
```

From this it should be obvious that the snippet implements strncmp, which compares two strings up to `limit` characters. If the strings are equal up to `limit`, the snippet returns 0. Otherwise it returns a negative number if the second string comes lexicographically after the first, and a positive number vice-versa.

Example:

string 1 string 2 limit result
“string a” “string b” 1000 -1
“string a” “string b” 5
“string a” “String b” 1000 32
“word” “wording” 5 -105

## Mystery 9

What does the function shown in Figure 2-15 do?

The code is a stripped down version of mystery8:

```mystery9
2D E9 30 48      PUSH.W {R4,R5,R11,LR}
0D F2 08 0B      ADDW R11, SP, #8
09 4D            LDR R5, =byteArray
06 E0            B loc_100E312
loc_100E304
0B 78            LDRB R3, [R1]
5A 5D            LDRB R2, [R3,R5]
63 5D            LDRB R3, [R4,R5]
93 42            CMP R3, R2
04 D1            BNE loc_100E318
loc_100E312
04 78            LDRB R4, [R0]
00 2C            CMP R4, #0
F5 D1            BNE loc_100E304
loc_100E318
0B 78            LDRB R3, [R1]
5A 5D            LDRB R2, [R3,R5]
03 78            LDRB R3, [R0]
5B 5D            LDRB R3, [R3,R5]
98 1A            SUBS R0, R3, R2
BD E8 30 88      POP.W {R4,R5,R11,PC}
; End of function mystery9
```

### ARM or Thumb

The code is in Thumb state:

• The snippet uses 16bit and 32bit instructions.
• Some instructions have the `.W` suffix.
• The code uses `PUSH.W` and `POP.W` as function prologue and epilogue

See mystery 8.

### Types

The function takes two arguments. `arg1 = R0` and `arg2 = R1`. They are used in the same manner as in mystery 8 so `arg1` and `arg2` are null terminated strings.

### Function Prototype

The function prototype is:

`UNSIGNED INT mystery9(CHAR*, CHAR*);`

### Prologue and Epilogue

The same as in mystery 8 applies, except the function doesn’t need `R6` and therefore doesn’t save and restore it.

### Purpose and Pseudo-code

We can get the pseudo-code by removing all missing parts from mystery8:

```FUNCTION mystery8(string1, string2)
index = 0
WHILE TRUE
IF string1[index] == 0 OR string1[index] != string2[index] THEN
RETURN string1[index] - string2[index]
ENDIF
index = index + 1
ENDWHILE
RETURN 0
END
```

The code implements strcmp, which is the unlimited version of strncmp from mystery8. So the comparison always checks the entire strings up to the length of the shorter one:

Example:

string 1 string 2 result
“string a” “string b” -1
“string a” “String b” 32
“word” “wording” -105