Trying to think like a compiler, Part 2
(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)
Nothing terribly exciting happened since the last post. I am (slowly) porting START.EXE
functions from assembly to C one by one, doing some additional research using IDA and the dosbox debugger to figure out what specific routines do at the same time. The game is still not runnable, but I’m making progress.
One thing that changed is that I’ve made a lst2ch.py
script to convert the listing output of IDA into a C header and source file. That way, whenever I rename routines or variables in IDA, I do not need to carry them over into the header file manually - the header file is now autogenerated. I still need to rename stuff in my source files manually, at least for now, but at least it gives me an autocomplete on all routines and data objects. The source file contains definitions for all the data, with the eventual goal of eventually moving it into C - I’m linking with an autogenerated assembly object that has my ported routines automatically removed for now.
I have also implemented some statistics into the script which show me how much work remains to be done:
ninja@dell:eaglestrike$ make start tools/lst2ch.py lst/start.lst src/f15-se2 conf/start_rc_conf.py --noc Writing C header file: src/f15-se2/start.h Found routines: total: 247, ignored: 53, remaining: 194, ported: 23, need porting: 171 Accumulated routines' size: remaining: 21638/0x5486/100%, ported: 4198/0x1066/19.4%, need porting: 17440/0x4420/80.6% Found 1013 variables, size = 31504/0x7b10 [...]
The 53 ignored routines are those from the C library and the single-instruction far jump trampoline routines invoking code from the driver overlays. That leaves us with 194 routines to rewrite, either into C or assembly (some of the game’s routines were obviosly written in assembly and hence are impossible to match exactly from C) with hardcoded numerical offsets resolved into correct data and code references. As evident from the output, I have completed 23 routines, totaling around 4KB of code, which amounts to slightly below 20% of the total work - for the first executable alone! Well, nobody said this was going to be quick or easy.
I also made a vga2bmp.py
script to turn binary data dumps from the Dosbox debugger into bitmaps, with customizable size and palette. One thing is good for is figuring out what various drawing routines do; I place a breakpoint on a routine, do a memdumpbin a000:0 fa00
in the debugger to dump the contents of video RAM to a file, step over the function, dump again, then run the script on the “before” and “after” dump to see what changed on the screen. Or modify the value of a routine argument before it is pushed onto the stack, then trigger the routine and see what the argument controls. Here for example I am figuring out the (x,y) coordinate arguments of the drawString
routine:
Another thing it can do is display arbitrary data as pixels - I was hoping to see font or some other visual data in either START.EXE
or the video driver overlay, but unfortunately that wasn’t the case. Makes for an interesting visual gimmick though:
Gee, that’s a lot of zeros. Also the driver jump trampolines are visible as regular brown dots for 0xEA
far jump opcodes if you strain your eyesight.
In any case, today I wanted to go over some… interesting examples of assembly code that I encountered on my voyage, and the C code they ended up resolving into. So let’s jump into it.
Keeping the numbers up in the air
I encountered this in the routine I ended up calling missionSelect
. It pops up the briefing board with the guy pointing his arm at it as seen above, and lets the player choose the difficulty and the scenario. This particular bit handles the case where the “Other scenario” option was selected to go into a submenu which lets the player pick missions from the Desert Storm scenario disk. The game is going over a hardcoded array of scenario filenames, trying to fopen()
them, and setting a flag in an array for those that were found:
The first eyebrow-raising construct is the cmp-sbb-neg
thing. It took me a while, but turns out this is a known idiom for a branchless check to see if a value was NULL or not. If the file handle (which is in AX
) is NULL, then cmp ax, 1
(which is essentially a subtraction) will set the carry flag (CF
== 1). Next, sbb cx, cx
is confusing because we did not put anything in CX
, but this is just a way to isolate the value of the carry flag, since it essentially means cx = cx - cx - CF
, so ultimately cx = -CF
. Last, we use neg cx
to get a positive value, and CX
becomes 0 if the file handle was originally NULL
, and 1 otherwise, but no conditional jump was required. Pretty neat.
The other thing is how the code reuses the register values for things it already calculated to become inputs into the next step. I will spare the reader an account of all the things that I tried, but in the end, the above assembly is matched perfectly by the following C code:
If you asked “who writes code like this?!”, then I feel you more than you know, but in this case I can understand the developers. Compilers of that era were primitive, and breaking up the complicated expression to write the result into a variable would result in emitting code which does a data write, then a read of that same data to use in the next expression (guess how I know). The dev was trying to keep the numbers dancing in the registers as long as possible to avoid incurring a performance hit. This would not matter with a modern compiler, but it just shows what hoops they needed to jump through back in 1989.
Non-orthogonal BS
Moving on but still in the same routine, here’s a seemingly innocuous series of conditional checks and jumps:
This writes itself:
However, no matter what I tried, the jz
at 0x889
ended up being turned into an inverse jnz+jmp short
:
I have since refactored the code to use structs instead of the FARPTR_CAST_OFFSET
ugliness, which is why it’s a little different in the screenshot, but it does the same thing.
I usually write out an entire routine while looking at the disassembly, then tweak it into matching. This usually does not matter because for the most part, the process is orthogonal - changing a source code line only affects the assembly generated at that location. But there are exceptions. Here, because I had mismatching code later on at 0x907, the jump destination was too far for a short jump (+/-0x7f bytes), so the compiler was forced to emit a short-distance jnz
instead, followed by a non-short jmp
by 0x82 bytes forward. I’ve seen other examples of this, such as the compiler creating storage on the stack for some intermediate values, altering the amount of sub sp, 0x123
done at the entry to a function before the code in question, but it’s fortunately rare. Here, the problem was fixed by getting the code in the loop later to match, which is an interesting problem on its own:
How many ways to write a 3-line loop?
As can (hopefully) be seen in the above screenshot, I’m having problems with this loop code:
I spent 3 days trying to write it out as for/while/do-while, but could not obtain the single jz
at the end - somehow it always ended up with a sequence of jz-jmp short-jmp short
, with minor variations. Finally the golden ticket for this bit as well as the previous one ended up being this:
Putting the latter assignment into the loop condition by itself is not something that occured easily, believe me. This is an another example of the intermediate value reuse paradigm as discussed in the first item, although somewhat less extreme.
An another interesting quirk was that the break
statement inside the loop cannot be a return
even though it ends up going to the routine return anyway. Doing so elided a les bx
for the gameData->difficulty
value - the one from gameData->theater
was reused since it’s the same structure, but the kicker is that the code for the loop and function termination is identical! No idea why it has that effect (an in-line return forcing an evaluation of branch destinations and value propagation along those? It figured out it didn’t need to reload it, but normally it wouldn’t check?), but it’s an another example of “non-orthogonal BS” where later code changes the form of code that comes before it. Huh, I guess they are not that rare after all…
Inlined functions anyone?
For my last trick, I present you this code from a routine which handles player input on the pilot select screen:
It uses the so-called string instruction repne movsw
to copy 16 (10h) 2-byte words from one buffer to another, which is what would be known in C as a memcpy()
. But the thing is, this compiler does not support inline functions, so I would be expecting a normal call
to the C library instead. An another possibility is that this function was manually written in assembly. But it does not smell as such, I was able to get a large portion of it before this construct to match, it calls other functions from the C library, uses stack variables instead of registers…
An another thing is that technically repne movsw
is an illegal instruction. The way it works, is there are two so called prefix opcodes for string instructions like MOVS/CMPS/SCAS/STOS
in the 8086 instruction set which cause those instructions to be executed in a loop, as many times as the number in the CX
register. The first one is 0xF2 and it represents REPNE/REPNZ
, the other one is 0xF3 and it represents REP/REPE/REPZ
. So, technically REP
is the same thing as REPE
or REPZ
, while REPNE
is the same as REPNZ
. However, convention dictates that REP
be paired with MOVS
(memcpy()
) and STOS
(memset()
), while REPE/REPZ/REPNE/REPNZ
with CMPS
(memcmp()/strcmp()
) and SCAS
(strlen()
). The difference between REPE/REPZ
and REPNE/REPNZ
is that while both execute the string instruction CX
-times, the former will terminate when the zero flag ZF
becomes 1, while the latter will do so when the flag becomes zero. As CMPS
or SCAS
are proceeding down the “string”, they affect the flags based on the comparisons performed at individual locations, so this is a way to say “proceed until string end or until 0/not 0”. Because MOVS
and STOS
do not affect flags, it does not (technically) make sense to use REPNE
on them - it should be REP MOVSW
, but it works just as well. Whew. This was hard for me to distill, but I could not find a complete explanation online, so there you go.
So, has this been hand-crafted in assembly after all? Not necessarily, as MSC 5.1 is capable of emitting repne movsw
. Whoa there, didn’t I say it does not support inline functions? Well yes I did, but let me introduce you to high tech from 1989: instrinsic functions!
What it does is, you build your code with /Oi
, and the calls to the limited set of standard library functions will (hopefully) be replaced with inlined versions generated by the compiler. Let’s give it a go then, shall we?
ninja@dell:eaglestrike$ make hello cl /Gs /Oi /c /Foe:\hello.obj hello.c link /M /NOD /I hello.obj,e:\hello.exe,,slibce.lib;
This results in the following code:
It puts the number of bytes in CX (mov cx, 0x6
), then divides it by 2 (shr cx, 1
) to obtain the number of 16-bit words to transfer in bulk with repne movsw
, and if the count was uneven, adc cx, cx/repne movsb
will transfer the remainder. Here, the count is even, but as you can see it’s not smart enough to get rid of the adc
thing. I’ve tried everything that I could think of, set different optimizations like /Oa
(no aliasing present, pinky swear), changed the type of the arrays to int
instead of char
(memcpy()
takes void *
so it should not matter, but maybe the compiler checks it?), tried specifying explicit sizes for the arrays, used pointers instead - nothing worked.
What, you thought I had answers? Sorry, not this time. I have no idea how to make this match. The closest I’ve been able to get was to build with MSC 6.0 - it was smart enough to figure out it didn’t need adc/movsb
any more. But it emitted rep movsw
instead, so it’s still not a match, and ultimately useless. There are nop
instructions before and after this routine, which might mean it comes from a separate C source file of its own, and was built with a different compiler and different flags than the rest of the code, but I don’t know what those might be, and I’m not eager to relive the Great Hunt at this time. What I ended up doing was to ignore this routine and move on – for now. I might get a bathroom epiphany, or I will end up rewriting it into non-identical code at the end. 80% is a long way to go, so I’m not going to waste it getting hung up on this.
Till next time then.