(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)

Nothing terribly exciting happened since the last post. I am (slowly) porting START.EXE functions from assembly to C one by one, doing some additional research using IDA and the dosbox debugger to figure out what specific routines do at the same time. The game is still not runnable, but I’m making progress.

One thing that changed is that I’ve made a lst2ch.py script to convert the listing output of IDA into a C header and source file. That way, whenever I rename routines or variables in IDA, I do not need to carry them over into the header file manually - the header file is now autogenerated. I still need to rename stuff in my source files manually, at least for now, but at least it gives me an autocomplete on all routines and data objects. The source file contains definitions for all the data, with the eventual goal of eventually moving it into C - I’m linking with an autogenerated assembly object that has my ported routines automatically removed for now.

I have also implemented some statistics into the script which show me how much work remains to be done:

ninja@dell:eaglestrike$ make start
tools/lst2ch.py lst/start.lst src/f15-se2 conf/start_rc_conf.py --noc
Writing C header file: src/f15-se2/start.h
Found routines: total: 247, ignored: 53, remaining: 194, ported: 23, need porting: 171
Accumulated routines' size: remaining: 21638/0x5486/100%, ported: 4198/0x1066/19.4%, need porting: 17440/0x4420/80.6%
Found 1013 variables, size = 31504/0x7b10
[...]

The 53 ignored routines are those from the C library and the single-instruction far jump trampoline routines invoking code from the driver overlays. That leaves us with 194 routines to rewrite, either into C or assembly (some of the game’s routines were obviosly written in assembly and hence are impossible to match exactly from C) with hardcoded numerical offsets resolved into correct data and code references. As evident from the output, I have completed 23 routines, totaling around 4KB of code, which amounts to slightly below 20% of the total work - for the first executable alone! Well, nobody said this was going to be quick or easy.

I also made a vga2bmp.py script to turn binary data dumps from the Dosbox debugger into bitmaps, with customizable size and palette. One thing is good for is figuring out what various drawing routines do; I place a breakpoint on a routine, do a memdumpbin a000:0 fa00 in the debugger to dump the contents of video RAM to a file, step over the function, dump again, then run the script on the “before” and “after” dump to see what changed on the screen. Or modify the value of a routine argument before it is pushed onto the stack, then trigger the routine and see what the argument controls. Here for example I am figuring out the (x,y) coordinate arguments of the drawString routine:

beforeafter

Another thing it can do is display arbitrary data as pixels - I was hoping to see font or some other visual data in either START.EXE or the video driver overlay, but unfortunately that wasn’t the case. Makes for an interesting visual gimmick though:

start.exe data

Gee, that’s a lot of zeros. Also the driver jump trampolines are visible as regular brown dots for 0xEA far jump opcodes if you strain your eyesight.

In any case, today I wanted to go over some… interesting examples of assembly code that I encountered on my voyage, and the C code they ended up resolving into. So let’s jump into it.

Keeping the numbers up in the air

I encountered this in the routine I ended up calling missionSelect. It pops up the briefing board with the guy pointing his arm at it as seen above, and lets the player choose the difficulty and the scenario. This particular bit handles the case where the “Other scenario” option was selected to go into a submenu which lets the player pick missions from the Desert Storm scenario disk. The game is going over a hardcoded array of scenario filenames, trying to fopen() them, and setting a flag in an array for those that were found:

startCode1:0781	selectedOther:
startCode1:0781			mov	[bp+count], 4
startCode1:0786			mov	[bp+index], 0
startCode1:078B			jmp	short otherIter
startCode1:078D	otherNext:
startCode1:078D			inc	[bp+index]
startCode1:0790	otherIter:
startCode1:0790			cmp	[bp+index], 4
startCode1:0794			jge	short checkOther
startCode1:0796			mov	bx, plh3d3Ptr
startCode1:079A			mov	si, [bp+index]
startCode1:079D			shl	si, 1
startCode1:079F			mov	si, scenarioCodePtr[si]
startCode1:07A3			mov	al, [si]
startCode1:07A5			mov	[bx], al
startCode1:07A7			mov	bx, plh3d3Ptr
startCode1:07AB			mov	si, [bp+index]
startCode1:07AE			shl	si, 1
startCode1:07B0			mov	si, scenarioCodePtr[si]
startCode1:07B4			mov	al, [si+1]
startCode1:07B7			mov	[bx+1],	al
startCode1:07BA			mov	ax, offset aRb_1 ; "rb"
startCode1:07BD			push	ax
startCode1:07BE			push	plh3d3Ptr
startCode1:07C2			call	_fopen
startCode1:07C5			add	sp, 4
startCode1:07C8			mov	fileHandle, ax
startCode1:07CB			cmp	ax, 1  🔴🔴🔴 ???
startCode1:07CE			sbb	cx, cx 🔴🔴🔴 ???
startCode1:07D0			neg	cx     🔴🔴🔴 ???
startCode1:07D2			mov	bx, [bp+index]
startCode1:07D5			mov	scenarioFoundArr[bx], cl ; place flag in array indicating scenario's presence
startCode1:07D9			or	cl, cl
startCode1:07DB			jz	short otherCont
startCode1:07DD			dec	[bp+count]	; failed to fopen, decrease avail extra	scenario count
startCode1:07E0	otherCont:
startCode1:07E0			push	fileHandle
startCode1:07E4			call	_fclose
startCode1:07E7			add	sp, 2
startCode1:07EA			jmp	short otherNext

The first eyebrow-raising construct is the cmp-sbb-neg thing. It took me a while, but turns out this is a known idiom for a branchless check to see if a value was NULL or not. If the file handle (which is in AX) is NULL, then cmp ax, 1 (which is essentially a subtraction) will set the carry flag (CF == 1). Next, sbb cx, cx is confusing because we did not put anything in CX, but this is just a way to isolate the value of the carry flag, since it essentially means cx = cx - cx - CF, so ultimately cx = -CF. Last, we use neg cx to get a positive value, and CX becomes 0 if the file handle was originally NULL, and 1 otherwise, but no conditional jump was required. Pretty neat.

The other thing is how the code reuses the register values for things it already calculated to become inputs into the next step. I will spare the reader an account of all the things that I tried, but in the end, the above assembly is matched perfectly by the following C code:

// 0x781
for (count = 4, index = 0; index < 4; index++) { // find extra scenarios
    plh3d3Ptr[0] = *scenarioCodePtr[index]; 
    plh3d3Ptr[1] = *(scenarioCodePtr[index] + 1);
    // 0x7c2

    if ((scenarioFoundArr[index] = ((fileHandle = fopen(plh3d3Ptr, aRb_1)) == NULL))) count--; // 😵😵😵
    // 0x7e4
    fclose(fileHandle);
}

If you asked “who writes code like this?!”, then I feel you more than you know, but in this case I can understand the developers. Compilers of that era were primitive, and breaking up the complicated expression to write the result into a variable would result in emitting code which does a data write, then a read of that same data to use in the next expression (guess how I know). The dev was trying to keep the numbers dancing in the registers as long as possible to avoid incurring a performance hit. This would not matter with a modern compiler, but it just shows what hoops they needed to jump through back in 1989.

Non-orthogonal BS

Moving on but still in the same routine, here’s a seemingly innocuous series of conditional checks and jumps:

startCode1:0876			les	bx, gameData
startCode1:087A			cmp	word ptr es:[bx+COMM_BUFFER_THEATER], 6	; 6 == desert storm
startCode1:087F			jz	short checkDifficulty
startCode1:0881			jmp	out
startCode1:0884	checkDifficulty:
startCode1:0884			cmp	word ptr es:[bx+COMM_BUFFER_DIFFICULTY], 4 ; diff 4 is demo mode
startCode1:0889			jz	short out

This writes itself:

// 0x876
// show mission type dialog for desert storm
if (gameData->theater == THEATER_DS && gameData->difficulty != DIFFICULTY_DEMO) {
    // ...

However, no matter what I tried, the jz at 0x889 ended up being turned into an inverse jnz+jmp short:

jump short mismatch

I have since refactored the code to use structs instead of the FARPTR_CAST_OFFSET ugliness, which is why it’s a little different in the screenshot, but it does the same thing.

I usually write out an entire routine while looking at the disassembly, then tweak it into matching. This usually does not matter because for the most part, the process is orthogonal - changing a source code line only affects the assembly generated at that location. But there are exceptions. Here, because I had mismatching code later on at 0x907, the jump destination was too far for a short jump (+/-0x7f bytes), so the compiler was forced to emit a short-distance jnz instead, followed by a non-short jmp by 0x82 bytes forward. I’ve seen other examples of this, such as the compiler creating storage on the stack for some intermediate values, altering the amount of sub sp, 0x123 done at the entry to a function before the code in question, but it’s fortunately rare. Here, the problem was fixed by getting the code in the loop later to match, which is an interesting problem on its own:

How many ways to write a 3-line loop?

As can (hopefully) be seen in the above screenshot, I’m having problems with this loop code:

startCode1:08C8	selectMission:
startCode1:08C8			mov	ax, 4
startCode1:08CB			push	ax		; select
startCode1:08CC			push	missionStr	; string
startCode1:08D0			mov	ax, offset miss_unk2
startCode1:08D3			push	ax		; p2
startCode1:08D4			mov	ax, offset miss_unk1
startCode1:08D7			push	ax		; p1
startCode1:08D8			call	missionMenuSelect
startCode1:08DB			add	sp, 8
startCode1:08DE			mov	missionPick, ax
startCode1:08E1			cmp	ax, 4
startCode1:08E4			jz	short missionIs4
startCode1:08E6			jmp	short out
startCode1:08E8	missionIs4:
startCode1:08E8			mov	ax, 4
startCode1:08EB			push	ax		; select
startCode1:08EC			push	missionStr	; string
startCode1:08F0			mov	ax, offset miss_unk4
startCode1:08F3			push	ax		; p2
startCode1:08F4			mov	ax, offset miss_unk3
startCode1:08F7			push	ax		; p1
startCode1:08F8			call	missionMenuSelect
startCode1:08FB			add	sp, 8
startCode1:08FE			add	ax, 4
startCode1:0901			mov	missionPick, ax
startCode1:0904			cmp	ax, 8
startCode1:0907			jz	short selectMission

I spent 3 days trying to write it out as for/while/do-while, but could not obtain the single jz at the end - somehow it always ended up with a sequence of jz-jmp short-jmp short, with minor variations. Finally the golden ticket for this bit as well as the previous one ended up being this:

    if (gameData->theater == THEATER_DS && gameData->difficulty != DIFFICULTY_DEMO) {
        // 0x890
        scenarioFoundArr[0] = scenarioFoundArr[1] = 0;
        // 0x895
        scenarioFoundArr[2] = scenarioFoundArr[3] = scenarioFoundArr[4] = 1;
        // 0x8ad
        if (missionMenuSelect(missType_unk1, missType_unk2, aMissionType, 0) == 0) {
            // 0x8c2
            nearmemset(scenarioFoundArr, 0, 5);
            // 0x8c8
            do {
                // 0x8d8
                if ((missionPick = missionMenuSelect(miss_unk1, miss_unk2, missionStr, 4)) != 4) break;
                // 0x8f8
            } while ((missionPick = missionMenuSelect(miss_unk3, miss_unk4, missionStr, 4) + 4) == 8);
        }
    }
} // routine ends here

Putting the latter assignment into the loop condition by itself is not something that occured easily, believe me. This is an another example of the intermediate value reuse paradigm as discussed in the first item, although somewhat less extreme.

An another interesting quirk was that the break statement inside the loop cannot be a return even though it ends up going to the routine return anyway. Doing so elided a les bx for the gameData->difficulty value - the one from gameData->theater was reused since it’s the same structure, but the kicker is that the code for the loop and function termination is identical! No idea why it has that effect (an in-line return forcing an evaluation of branch destinations and value propagation along those? It figured out it didn’t need to reload it, but normally it wouldn’t check?), but it’s an another example of “non-orthogonal BS” where later code changes the form of code that comes before it. Huh, I guess they are not that rare after all…

Inlined functions anyone?

For my last trick, I present you this code from a routine which handles player input on the pilot select screen:

startCode1:200E			mov	si, [bp+var_6]
startCode1:2011			mov	cl, 5
startCode1:2013			shl	si, cl
startCode1:2015			push	si
startCode1:2016			lea	di, hallfameBuf[si]
startCode1:201A			lea	si, word_1E366[si]
startCode1:201E			push	ds
startCode1:201F			pop	es
startCode1:2020			mov	cx, 10h
startCode1:2023			repne movsw
startCode1:2025			pop	si
startCode1:2026			inc	[bp+var_6]

It uses the so-called string instruction repne movsw to copy 16 (10h) 2-byte words from one buffer to another, which is what would be known in C as a memcpy(). But the thing is, this compiler does not support inline functions, so I would be expecting a normal call to the C library instead. An another possibility is that this function was manually written in assembly. But it does not smell as such, I was able to get a large portion of it before this construct to match, it calls other functions from the C library, uses stack variables instead of registers…

An another thing is that technically repne movsw is an illegal instruction. The way it works, is there are two so called prefix opcodes for string instructions like MOVS/CMPS/SCAS/STOS in the 8086 instruction set which cause those instructions to be executed in a loop, as many times as the number in the CX register. The first one is 0xF2 and it represents REPNE/REPNZ, the other one is 0xF3 and it represents REP/REPE/REPZ. So, technically REP is the same thing as REPE or REPZ, while REPNE is the same as REPNZ. However, convention dictates that REP be paired with MOVS (memcpy()) and STOS (memset()), while REPE/REPZ/REPNE/REPNZ with CMPS (memcmp()/strcmp()) and SCAS (strlen()). The difference between REPE/REPZ and REPNE/REPNZ is that while both execute the string instruction CX-times, the former will terminate when the zero flag ZF becomes 1, while the latter will do so when the flag becomes zero. As CMPS or SCAS are proceeding down the “string”, they affect the flags based on the comparisons performed at individual locations, so this is a way to say “proceed until string end or until 0/not 0”. Because MOVS and STOS do not affect flags, it does not (technically) make sense to use REPNE on them - it should be REP MOVSW, but it works just as well. Whew. This was hard for me to distill, but I could not find a complete explanation online, so there you go.

So, has this been hand-crafted in assembly after all? Not necessarily, as MSC 5.1 is capable of emitting repne movsw. Whoa there, didn’t I say it does not support inline functions? Well yes I did, but let me introduce you to high tech from 1989: instrinsic functions!

intrinsic functions description in the manual

What it does is, you build your code with /Oi, and the calls to the limited set of standard library functions will (hopefully) be replaced with inlined versions generated by the compiler. Let’s give it a go then, shall we?

#include <memory.h>

char a[] = "foobar";
char b[] = "barfoo";

int main() {
    memcpy(b, a, 0x6);
    return 0;
}
ninja@dell:eaglestrike$ make hello
cl /Gs /Oi /c /Foe:\hello.obj hello.c
link /M /NOD /I hello.obj,e:\hello.exe,,slibce.lib;

This results in the following code:

00000012  B90600            mov cx,0x6
00000015  BF4800            mov di,0x48
00000018  BE4200            mov si,0x42
0000001B  8CD8              mov ax,ds
0000001D  8EC0              mov es,ax
0000001F  D1E9              shr cx,1
00000021  F2A5              repne movsw
00000023  13C9              adc cx,cx
00000025  F2A4              repne movsb

It puts the number of bytes in CX (mov cx, 0x6), then divides it by 2 (shr cx, 1) to obtain the number of 16-bit words to transfer in bulk with repne movsw, and if the count was uneven, adc cx, cx/repne movsb will transfer the remainder. Here, the count is even, but as you can see it’s not smart enough to get rid of the adc thing. I’ve tried everything that I could think of, set different optimizations like /Oa (no aliasing present, pinky swear), changed the type of the arrays to int instead of char (memcpy() takes void * so it should not matter, but maybe the compiler checks it?), tried specifying explicit sizes for the arrays, used pointers instead - nothing worked.

What, you thought I had answers? Sorry, not this time. I have no idea how to make this match. The closest I’ve been able to get was to build with MSC 6.0 - it was smart enough to figure out it didn’t need adc/movsb any more. But it emitted rep movsw instead, so it’s still not a match, and ultimately useless. There are nop instructions before and after this routine, which might mean it comes from a separate C source file of its own, and was built with a different compiler and different flags than the rest of the code, but I don’t know what those might be, and I’m not eager to relive the Great Hunt at this time. What I ended up doing was to ignore this routine and move on – for now. I might get a bathroom epiphany, or I will end up rewriting it into non-identical code at the end. 80% is a long way to go, so I’m not going to waste it getting hung up on this.

Till next time then.