Practical Reverse Engineering Solutions – Page 78 (Part IV)

my go at mystery10 and mystery 11 on pages 78ff

This blog post presents my solution to exercises 10 and 11 on page 78ff from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.

Problem Statement

For the code in each exercise, do the following in order (whenever possible): 1. Determine whether it is in Thumb or ARM state.
2. Explain each instruction’s semantic. If the instruction is `LDR/STR explain the addressing mode as well.`,
3. Identify the types (width and signedness) for every possible object. For structures, recover field size, type, and friendly name whenever possible. Not all structure fields will be recoverable because the function may only access a few fields. For each type recovered, explain to yourself (or someone else) how you inferred it.
4. Recover the function prototype.
5. Identify the function prologue and epilogue.
6. Explain what the function does and then write pseudo-code for it.
7. Decompile the function back to C and give it a meaningful name.

Mystery 10

Exercise 10

Figure 2-16 is a function from Windows RT. Read MSDN if needed. Ignore the security `PUSH/POP` cookie routines.

This is `mystery11` from Figure 2-17:

```mystery10
2D E9 70 48 PUSH.W {R4–R6,R11,LR}
0D F2 0C 0B ADDW R11, SP, #0xC
37 F0 CC F9 BL __security_push_cookie
84 B0 SUB SP, SP, #0x10
0D 46 MOV R5, R1
00 24 MOVS R4, #0
10 2D CMP R5, #0x10
16 46 MOV R6, R2
0C D3 BCC loc_1010786
1A 4B LDR R3, =__imp_GetSystemTime
68 46 MOV R0, SP
1B 68 LDR R3, [R3]
98 47 BLX R3
00 9B LDR R3, [SP,#0x1C+var_1C]
10 24 MOVS R4, #0x10
33 60 STR R3, [R6]
01 9B LDR R3, [SP,#0x1C+var_18]
73 60 STR R3, [R6,#4]
02 9B LDR R3, [SP,#0x1C+var_14]
B3 60 STR R3, [R6,#8]
03 9B LDR R3, [SP,#0x1C+var_10]
F3 60 STR R3, [R6,#0xC]
loc_1010786
2B 1B SUBS R3, R5, R4
04 2B CMP R3, #4
04 D3 BCC loc_1010796
11 4B LDR R3, =__imp_GetCurrentProcessId
1B 68 LDR R3, [R3]
98 47 BLX R3
30 51 STR R0, [R6,R4]
loc_1010796
2B 1B SUBS R3, R5, R4
04 2B CMP R3, #4
04 D3 BCC loc_10107A6
0C 4B LDR R3, =__imp_GetTickCount
1B 68 LDR R3, [R3]
98 47 BLX R3
30 51 STR R0, [R6,R4]
loc_10107A6
2B 1B SUBS R3, R5, R4
08 2B CMP R3, #8
09 D3 BCC loc_10107C0
07 4B LDR R3, =__imp_QueryPerformanceCounter
68 46 MOV R0, SP
1B 68 LDR R3, [R3]
98 47 BLX R3
00 9B LDR R3, [SP,#0x1C+var_1C]
32 19 ADDS R2, R6, R4
33 51 STR R3, [R6,R4]
01 9B LDR R3, [SP,#0x1C+var_18]
53 60 STR R3, [R2,#4]
loc_10107C0
20 46 MOV R0, R4
04 B0 ADD SP, SP, #0x10
37 F0 A4 F9 BL __security_pop_cookie
BD E8 70 88 POP.W {R4–R6,R11,PC}
; End of function mystery10
```

ARM or Thumb

The code is in Thumb state:

• The code uses `PUSH.W` and `POP.W` pattern.
• There are 16bit instructions.
• 32bit instructions have the `.W` suffix

Instruction Semantic

• The code uses `BL` and `BLX` to call subroutines. The latter – `BLX` – switches state from Thumb to ARM.
• The instruction `BCC` branches on unsigned lower.
• In instructions like `LDR R3, [SP,#0x1C+var_1C]` in line 15, the value `var_1C` comes from the disassembler and probably has the value `-0x1C`. So the instruction boils down to `LDR R3, [SP,#0]`

Types

The first function parameter in `R0` is never read. The second function parameter in `R1` is used to determine which calls to execute, it is an unsigned integer type (we know this because of the `BCC` comparisons). The third parameter in `R3` points to a structure that holds the return values of the four system calls:

```typedef struct _STRUCT1 {
WORD wYear;
WORD wMonth;
WORD wDayOfWeek;
WORD wDay;
WORD wHour;
WORD wMinute;
WORD wSecond;
WORD wMilliseconds;
DWORD dwProcessId;
DWORD dwTickCount;
LARGE_INTEGER liPerformanceCounter;
} STRUCT1;```

The function returns the number of bytes placed in the above structure.

Function Prototype

The function prototype is:

`unsigned int mystery10(UNKNOWN, unsigned integer, struct1*);`

Prologue and Epilogue

The prologue and epilogue save and restore all registers that the function modifies, except of course the first three registers that hold function parameters. The function returns by pushing/popping `LR`.

Purpose and Pseudo-code

The function calls up to four different API functions and stores their return value in a structure passed as the third function parameter. Depending on the value of the third function parameter, API calls are made or not:

Value of `R1``Get SystemTime``Get CurrentProcessID``Get TickCount``Query PerformanceCounter`
R1 ≥ 26executedexecutedexecutedexecuted
26 > R1 ≥ 18executedexecutedexecuted-
18 > R1 ≥ 14executedexecuted--
14 > R1 ≥ 10executed---
10 > R1 ≥ 8-executedexecuted-
8 > R1 ≥ 4-executed--
4 > R1 ≥ 0----

The value of `R1` is probably supposed to be between 26 (meaning all four API calls are made) and 10 (only `GetSystemTime` is executed. If `R1` is less than 10, then the result values are placed at the wrong location in `struct1`. The four API calls are well documented:

GetSystemTime

Get the current system time and store it in the structure passed as the first function parameter, see MSDN. The structure is

```typedef struct _SYSTEMTIME {
WORD wYear;
WORD wMonth;
WORD wDayOfWeek;
WORD wDay;
WORD wHour;
WORD wMinute;
WORD wSecond;
WORD wMilliseconds;
} SYSTEMTIME, *PSYSTEMTIME;
```

Our code passes the stack pointer `arg3` to `GetSystemTime`. In line 5 it created a 16 byte stack frame that can hold the 8 two byte values of `SYSTEMTIME`. The code then loads members at offset , `4`, `8` and `12`. The code uses `LDR` which loads 32 bits or two members of `SYSTEMTIME`. So the four `LDR`/`STR` pairs copy the entire structure to `arg3`.

GetCurrentProcessId

This API call takes no parameters and returns a `DWORD` with the process id, see MSDN. In line 40 the return value (the process id) is placed in the structure at `arg3->dwProcessId` (again, assuming that `arg2` is not smaller than 10).

GetTickCount

Another easy function with no parameters that returns “the number of milliseconds that have elapsed since the system was started”, see MSDN. The return value is stored in `arg3->dwTickCount`.

QueryPerformanceCounter

The function takes “a pointer to a variable that receives the current performance-counter value, in counts” as the only parameter. The type of this parameter is `LARGE_INTEGER`, which has 8 bytes. The function stores those 8 bytes with two `STR` instructions in `arg1->liPerformanceCounter`.

C-Code

```unsigned int system_info(void arg1, unsigned int nr_bytes, struct1 *result)
{
unsigned int nr_of_copied_bytes = 0;
if ( nr_bytes >= 26 ) {
SYSTEMTIME SystemTime;
GetSystemTime(&SystemTime);
memcpy(result, &SystemTime, sizeof(struct SYSTEMTIME));
nr_of_copied_bytes += sizeof(struct SYSTEMTIME);
}
if ( nr_bytes >= 18 ) {
result->dwProcessId = GetCurrentProcessId();
nr_of_copied_bytes += sizeof(DWORD);
}
if ( nr_bytes >= 14 ) {
result->dwTickCount = GetTickCount()
nr_of_copied_bytes += sizeof(DWORD);
}
if ( nr_bytes >= 10 ) {
LARGE_INTEGER perfCounter;
QueryPerformanceCounter(&perfCounter);
result->liPerformanceCounter = perfCounter;
nr_of_copied_bytes += sizeof(LARGE_INTEGER);
}
return nr_of_copied_bytes;
}```

exercise 11

In Figure 2-17, `sub_101651C` takes three arguments and returns nothing. If you complete this exercise, you should pat yourself on the back.

I wasn’t able to solve this exercise, but I’m posting my preliminary results regardless. Maybe they help someone else reverse the code.

This is the code from Figure 2-16:

```mystery11
2D E9 F8 4F    PUSH.W {R3–R11,LR}
0D F2 20 0B    ADDW R11, SP, #0x20
B0 F9 5A 30    LDRSH.W R3, [R0,#0x5A]
07 46          MOV R7, R0
90 46          MOV R8, R2
00 EB 83 03    ADD.W R3, R0, R3,LSL#2
D3 F8 84 A0    LDR.W R10, [R3,#0x84]
7B 8F          LDRH R3, [R7,#0x3A]
89 46          MOV R9, R1
CB B9          CBNZ R3, loc_1018602
B0 F9 5A 40    LDRSH.W R4, [R0,#0x5A]
17 F1 20 02    ADDS.W R2, R7, #0x20
00 EB 44 03    ADD.W R3, R0, R4,LSL#1
B3 F8 5C 50    LDRH.W R5, [R3,#0x5C]
00 EB 84 03    ADD.W R3, R0, R4,LSL#2
D3 F8 84 00    LDR.W R0, [R3,#0x84]
83 89          LDRH R3, [R0,#0xC]
06 6C          LDR R6, [R0,#0x40]
03 EB 45 03    ADD.W R3, R3, R5,LSL#1
9B 19          ADDS R3, R3, R6
1C 78          LDRB R4, [R3]
5B 78          LDRB R3, [R3,#1]
43 EA 04 24    ORR.W R4, R3, R4,LSL#8
43 8A          LDRH R3, [R0,#0x12]
23 40          ANDS R3, R4
99 19          ADDS R1, R3, R6
FD F7 8D FF    BL sub_101651C
loc_1018602
BA 8E          LDRH R2, [R7,#0x34]
BB 6A          LDR R3, [R7,#0x28]
D0 18          ADDS R0, R2, R3
9A F8 02 30    LDRB.W R3, [R10,#2]
0B B1          CBZ R3, loc_1018612
00 22          MOVS R2, #0
00 E0          B loc_1018614
loc_1018612
3A 6A          LDR R2, [R7,#0x20]
loc_1018614
FB 8E          LDRH R3, [R7,#0x36]
B8 F1 00 0F    CMP.W R8, #0
01 D0          BEQ loc_1018620
80 18          ADDS R0, R0, R2
9B 1A          SUBS R3, R3, R2
loc_1018620
C9 F8 00 30    STR.W R3, [R9]
BD E8 F8 8F    POP.W {R3–R11,PC}
; End of function mystery11```

ARM or Thumb

The code is in Thumb state:

• The function uses `PUSH.W` and `POP.W` as function prologue and epilogue.
• There are both 16bit and 32bit instructions.
• 32bit instructions have the `.W` suffix.
• The `CBZ` instruction is only available in Thumb state.

Instruction Semantic

`LDRB.W` loads an unsigned byte, `LDRH.W` loads an unsigned short, `LDRSH.W` loads a (signed) short, and `LDR.W` loads 32bit integers,

Types

The first function parameter `arg1` in `R0` is a pointer to a complicated structure. Let this structure be `struct1`. One can infer a couple of members of this structure from the different `LDR` instructions. I came up with the following picture, but again, since I couldn’t figure out what the function does the picture could be completely off:

The second parameter `arg2` is pointer to a 32bit integer. The third parameter `arg3` is only compared to zero and could be almost anything, e.g., a pointer, an integer or a boolean. The return value of `mystery11` is an integer.

Function Prototype

The function prototype might look like this:

```int mystery11(struct1*, int*, unknown*)
```

Prologue and Epilogue

The function preserves registers `R3` to `R11` with `PUSH.W/POP.W` instructions. It uses the same two instructions to store `LR` and to return.

Purpose and Pseudo-code

I have no clue what the function does. Most lines just access different members of the structure in `arg1`. The three instructions starting in line 22 are interesting:

```1C 78          LDRB R4, [R3]
5B 78          LDRB R3, [R3,#1]
43 EA 04 24    ORR.W R4, R3, R4,LSL#8
```

They load two bytes from memory location `R3`, multiply the value of the first byte by 255 and add the second byte to the result. So this snippet is essentially loading a big-endian 16bit short. This could indicate that the function is operating on external data structure with big-endian shorts, like TCP/IP data.

My draft of the pseudo-code is:

```INT mystery11(STRUCT1 *arg1, INT *arg2, UNKNOWN *arg3)

struct2* pS2 = arg1 + 2*(arg1->field5A_s)
struct3* pS3 = arg1 + 4*(arg1->field5A_s)
struct4* pS4 = pS3->field84_p

IF arg1->field3A_s == 0 THEN
int index = pS4->field0C_s + 2*pS2->field5C_s
unsigned short bigEndian = pS4->field40_p[index]
unsigned short val = CONVERT_BIG_ENDIAN_SHORT(bigEndian)
int index2 = pS4->field12_s & val
sub_101651C(pS3, pS5[index2], arg1->field20_i)
ENDIF

int offset;
IF pS4->field02_c == 0 THEN
offset = arg1->field20_i;
ELSE
offset = 0;
ENDIF

int return_value = arg1->field28_i + arg1->field34_s
unsigned short new_value = arg1->field36_s

IF arg3 != 0 THEN
return_value = return_value + offset
new_value = new_value - offset
ENDIF

*arg2 = new_value
RETURN return_value

```