Trying to think like a compiler, Part 3
(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)
I have a confession to make. Although I’ve managed to figure out many puzzles on this project, including spending 5 months on searching for the right compiler flags, and coming out on top thus far, sometimes I still think it all might actually be impossible. The idea that code can be recovered from these old binaries and made to line up perfectly instruction by instruction sounds so absurd that I find it hard to believe, even though I’ve been managing to pull it off for a while now. Whenever I pick a new routine to reconstruct, I wonder, “is this going to be the one that defeats me?”. The rush that I get after clearing a routine, especially one that seemed impossible is intoxicating, which is probably the reason I keep doing this night after night. So let me share some more examples of what has been on my plate recently.
The one that got away
A while back, I mentioned coming across a routine that had the look of C, but it contained a repne movsw
instruction which I could not get the compiler to emit, at least not in the exact way needed:
I was expecting this to be a folding of memcpy()
into an intrinsic function, which are the 1980s equivalent of inline functions, and it was close, but not 100%. So I shelved this routine and moved on. Some time later, doing work on an unrelated routine, I figured out that the data involved was actually part of an array of structures, so my IDA-generated listing changed to the following around this area:
Is it memcpy()
ing an array member from one structure to another? Nope, 0x10
is the size of the entire structure (I know this by examining how the data is used elsewhere). Sometimes I forget that C can assign struct variables to each other, because it’s not used that often. But that is exactly what this is. It’s jiggling pilot roster entries around in a loop:
This matches the instruction sequence perfectly. The compiler took a shortcut to copying around entire structures (whose size is a compile time constant) using the 8086 string instructions and it took me a while to figure out how clever it was. Feels good to have it off the list though.
The gloves come off
Until now, I’ve had limited exposure to optimized code from this compiler. At some point I noticed that some routines were not doing the usual sub sp, N
for taking arguments off the stack after a function call, when the call was the last thing done in a function (because the stack pointer will be restored when the current routine returns so why bother). It turned out i had to remove the hard-earned /Zi
flag (which disables some optimizations for debugging) for the source file containing the affected routines, meaning they are now built with the default optimization settings of /Ot
(optimize for speed). Because in my reconstructed source code I have the routines arranged by the offset at which they appear in the original, it’s easy to see in which file a routine belongs that I just picked up for working on it. Until now, these routines did not do anything too complicated and were rather simple to rewrite into C, which lulled me into a sense of false security, concluding that the MSC optimizer was incapable of anything more exciting than simple peephole optimizations. If I only knew how wrong I was.
The comparison tool tripped on this routine for handling key input on the pilot selection screen, so I picked it up and started work on writing the C code. Soon, I was scratching my head at the erratic execution flow that was jumping all over the place, and that was supposed to come down to neat if
conditions and for
loops. Below is the routine, somewhat abbreviated, but otherwise showing the whole control flow:
For the record, writing out this routine in exactly the same sequence as the instructions in the disassembly, then using goto
to try and obtain the same sequence of control flow jumping around does not work, it’s not even close. This was written as a series of more or less neat control structures, and it was the compiler which made a mess out of it. It will not take in that same mess as input and generate that same output, its operation is not idempotent in that way.
The main problem here (other than the control jumping around a lot) is getting the jmp short
at 0x1f54
, and the assignment to var_2
at 0x1f7b
at the same time. Whenever I do something like this:
The compiler would do the assignment and the call for the switch where it was encountered, at the top of the loop, and not emit a jump.
I actually shelved this routine after staring at it for two days, completed some other ones, then encountered an another, even more hopeless-looking routine that I’m describing in the next section. Only after figuring that one out, did I come back here – it gave me the confidence that this was actually doable.
At some point, I removed the assignment, thinking it might be something the compiler is doing by itself (I’ve seen bogus writes to stack variables coming from nowhere before), and started playing with different loop variants. Then I noticed the jump appearing when the switch was directly under the loop, not in a brace, i.e. for(;;) switch()...
, not for(;;) { switch()...
. Why it acts that way, I have no idea. But the assignment was still missing. Huh, I knew the weird usage of the comma in C would come in handy some day. This was the golden ticket:
The weird layout of the subsequent switch sections matched the original perfectly by itself. It placed a sequence of comparisons at the beginning, and the actual code from the cases later, and the order of the comparisons seems to be dicated by the numerical value of the case, not the order they are put in the code - but it does influence the order that the handling code is laid out. The usage of the comma operator is bugging me a little bit as it’s not used often and there are simpler ways of spelling out the required control flow, but it will have to stay that way for now. Maybe somebody was trying to be clever, or they needed to add an extra step to the loop in a hurry and couldn’t be bothered to add braces - who knows.
I wanted my win quickly, but the compiler didn’t want to cooperate with this sequence of initializing members of a struct var at 0x1fbc
:
Instead of going through al
for the 8-bit members, ax
for the 16-bit one, and ax:dx
for the 32-bit, it would just do straight mov [0x...], 0x0
. Then I realized it would take assignment chaining:
The order of these is important, I had to essentially brute force my way around it, because if it’s different in even the slightest, then the code won’t match. It was particularly painful to get the 32-bit load with sub dx, dx
– it would just load the older byte with a constant 0. Again, I have no idea why having 8-bit values earlier in the chain makes it use dx
, but there you go.
The last item that was giving me trouble was the common handling code for the arrow key cases:
I could not put the handling code in the body of the loop and have the arrowkey cases break out of the switch into the common code, because making the loop have braces made my coveted jmp short
disappear. So I used the gotos. However, written out like this, it would move the common code under the first case block, and jump to it up from the others, and I needed to have it at the bottom and jump down. So I moved the code into the switch myself and this was the final piece:
Ctrl-C, Ctrl-V
This next routine handles typing in the name from the player on the pilot selection screen. Again, the control flow is all over the place, specifically within this section:
Overall, it’s a mess which I have untangled once and promptly forgotten. Ultimately, it comes to this:
The compiler appears to have synthesized the repeated expressions (the pilot->field[0]
, pilot->field[var_10]
assignments and the clearRect()...actualDrawString()
sequence) into one code block between 0x23eb
and 0x243b
, and it’s jumping into various points inside it from the different cases in the switch to account for the differences between the cases. Outwardly, it has the appearance of an inner loop, but it’s just code deduplication. So the lesson here is, for optimized code, if something crazy is happenning like one branch jumping into the middle of another, it might be possible (or even necessary) to avoid a goto and reduplicate the deduplicated code to get a match.
The other lesson is that reconstructing optimized code from this compiler is possible, if not always straightforward. I’m grateful that big chunks of the scenario disk version of the game have been built in debug mode with /Zi
(definitely the monstrous 0x4093 routine, I would have been screwed or at least considerably set back if that was optimized), which let me get off the ground and make progress initially. I’m a more seasoned reverse engineer now, which makes me believe I can take on the optimized routines when they inevitably come - probably until the next one, which is when the impostor syndrome will kick in again. 😉
Reconstructing optimized code is a mixed bag. On one hand, the optimizer will make many equivalent variations of your code converge to the same optimized form – so in some cases at least, you don’t need to be exact, there’s a natural “pull” for the opcodes to get organized in an optimized way. On the other hand, this also means that when you can’t get the compiler to emit the exact instructions that you need, it feels like nothing you do makes a difference, because of that same pull. That is until you figure out the factor that actually makes it behave different, and it all falls into place with a loud thud.
With this, and other routines I’ve done recently in place, I’m looking at 54% completion (size-wise) in porting overall, with 48 routines ported and 146 remaining (most of them probably assembly). My comparison tool is currently stopping at the 80% mark of the reachable code area, meaning it’s seen and compared or ignored that much of the non-libc code. Basically the first number (54%) is the progress of the overall bigger goal of porting START.EXE
to C, while the second (80%) is for the completion of the slightly smaller sub goal of the C code reconstruction.