(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)
What I’m doing right now is going through the dissassembly of the game executables, rewriting the code into C, and trying to obtain identical instructions when compiling back into executable form, using mzdiff to do the comparison. Despite some problems with being unable to get some parts to match no matter what I do (probably because I still haven’t nailed the right compiler + options combination), I have been making some progress with transcribing the code while using some more advanced options I implemented in mzdiff to ignore these differences. I would like to get rid of them completely, but I’ve been burned out on that front, so instead I decided to move forward and make some actual progress on the recreation.
While working on the START executable’s main() function, I came across this sequence of instructions in the disassembly:
Rewritten into C, it might look something like this:
This however generates the following code which does not exactly match:
It makes sense to reuse the segment value that is already in ES to push it as an argument, so why did the compiler use DX as a temporary location for it in the original game executable? It took me a couple days to figure out. The function does not accept two arguments, as in the segment and offset, separately. It accepts a single far pointer argument, and two arithmetic-capable registers, namely DX:AX are used as a placeholder for the entire 32bit value to be manipulated arithmetically as a whole. This is just a matter of correcting the declaration and the place it’s called to match:
Now the code matches up. However, some time later I come across this surprise:
This is part of a longer conditional expression, but again, rewritten into C, it comes to:
The problem I encountered was that the compiler generates code like this:
Why would the compiler put the value of var_C in AX the first time and compare with a far location, then reverse the order and put the far value in AX and compare with the stack location of var_6? The first thing that came to mind was to reverse the order of comparison in the condition for the second part of the ||:
This does not make a difference. I am not entirely sure how I came up with it, but I was thinking that CMP is essentially a SUB, i.e. a subtraction, so it might matter for the flags what signedness the values are and in what order they appear in the subtraction. Surprisingly, all it took for the code to match was to flip the declaration of var_6 to signed:
Again, this makes the code match up nicely.
I’m happy to have figured these minor pitfalls out, and I’m sure it will become useful elsewhere as I’m gathering a body of knowledge and building up the capability to recognize the compiler’s patterns within my wetware. 😉