More progress on EGAME

(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)

I am happy to say the reconstruction for EGAME.EXE is progressing smoothly. I was planning to do an update after reaching 10% completion, but I work in increments of routines, and finishing the previous one only took me past 9%, so I needed to do one more before celebrating. It was a pretty hefty one (sub_18E50, at least called that for now), so I’m happy to report we are actually sitting around 14%, with about 10k of opcodes transcribed into C and 52k to go:

--- Routine map stats (static):
Load module of executable is 167792/0x28f70 bytes
Routine map of 400 routines covers 71986/0x11932 bytes (42% of the load module)
Reachable code totals 73063/0x11d67 bytes (101% of the mapped area)
Unreachable code totals 202/0xca bytes (0% of the mapped area)
Excluded 122 routines take 5104/0x13f0 bytes (7% of the mapped area)
Reachable area of excluded routines is 5281/0x14a1 bytes (7% of the reachable area)
--- Comparison run stats (dynamic):
Seen 82 routines, visited 49 and compared 9662/0x25be bytes of opcodes inside (13% of the reachable area)
Ignored (seen but excluded) 33 routines totaling 1122/0x462 bytes (1% of the reachable area)
Practical coverage (visited & compared + ignored) is 10784/0x2a20 (14% of the reachable area)
Theoretical(*) coverage (visited & compared + reachable excluded area) is 14943/0x3a5f (20% of the reachable area)
Missed (not seen and not excluded) 229 routines totaling 52517/0xcd25 bytes (72% of the covered area)
(*) Any routines called only by ignored routines have not been seen and will lower the practical score,
    but theoretically if we stepped into ignored routines, we would have seen and ignored any that were excluded.

Mind that some of that 52k will turn out to be assembly so not target for the reconstruction per se, but will need porting to C anyway. So ultimately, more work, just later.

Seeing as it’s been almost exactly a month since I’ve started actually transcribing the code (it was a lot of preparatory work to get the new executable building in my framework before that), that means that I’m roughly able to do 10k in a month, so theoretically if I went all in, I could probably finish the transcription within 5-6 months. Incidentally, this 10k/month lines up with what Fabien Sanglard based his calculations on when considering whether reversing Strike Commander would be possible, before deciding it would take too many years. I think he might have been a bit pessimistic, because of those binaries’ size, a lot is bound to be data, some might be libc… But I’m still happy my game is smaller. 😁

So, will EGAME be reconstructed around October? I really doubt it, it’s probably going to take a year or more. I actually plan to take it easy for a while, with summer around the corner and more family functions and activities that inevitably brings. I also want to play a few games that I bought on Steam a while back but didn’t even have time to check out. I’m not likely to stop completely, I’m just managing expectations (my own, mostly 😉) to clarify that I’m not going to be able to maintain this pace. But the future is looking bright, so don’t fret.

With that out of the way, let’s look at some interesting code snippets that have popped up while doing the reconstruction.

[bx+si]

I found this in the routine which loads .3dt (terrain) files. I don’t understand the format for now, but these functions will go a long way towards figuring it out one day.

    mov	    ax,	[bp+var_8]
    ; ...
    mov	    si,	[bp+var_4]
    mov	    cl,	6
    shl	    si,	cl
    mov	    bx,	ax
    shl	    bx,	1
    mov	    ax,	[bp+var_6]
    add	    ax,	offset buf1_3dt
    mov	    word_1234[bx+si],	ax
    mov	    [bp+var_A],	0

After putting one index into si and another into bx, the code reaches into what looks like an array of integers using the [bx+si] memory addressing mode. But why didn’t it just build the index in one register? I tried various ways of writing the indexing expression before realizing this is actually a 2-dimensional array. The first shift by 6 is multiplying the column value in var_4 by 64, which is the size of the row (32) times the size of the element (uint16: 2), then another shift on the row index multiplies it by 2 to get the final offset. It’s actually lucky that it uses a different indexing mode for matrices, else I might have had difficulties recognizing it as such. Now I can rename and declare word_1234 as something more meaningful:

extern int matrix3dt_2[5][32];

void load3DT(char *arg_0) {
    // ...
    matrix3dt_2[var_4][var_8] = (int16)(buf1_3dt + var_6);
    // ...
}

sub-sbb-and-add-and-…what?

This time we is in the .3d3 file loading routine, which I suspect are 3d models. Either way:

    mov	    ax,	size3d3_2
    sub	    ax,	800h
    sbb	    cx,	cx
    and	    ax,	cx
    add	    ah,	8
    mov	    [bp+var_A],	ax

We’ve seen sub-sbb-neg used by the compiler before as a way to perform branchless NULL checks, so that’s a hint it might be trying something similar. I’m using my old trusted technique of plotting the values for the significant cases. Here, it’s clearly trying to compare something against 0x800, so let’s pick one value below, and one above it:

Instruction	Value (ax=0x1234)	Value (ax=0x200)
mov ax, size3d3_2	ax=0x1234	ax=0x200
sbb cx, cx	cx=0	cx=0xffff
and ax, cx	ax=0	ax=0xfa00
add ah, 8	ax=0x800	ax=0x200

Seeing the values makes it crystal clear that it’s just clamping the value to the range 0-0x800. This simple code matches the binary arithmetic mess exactly:

var_A = (size3d3_2 >= 0x800) ? 0x800 : size3d3_2;

I really need a clever heading for this

Still in the .3d3 routine:

    mov	    ax,	[bp+var_A] ; part of an earlier calculation
    add	    [bp+var_E],	ax
    cmp	    [bp+var_10], 0
    jnz	    short loc_12C56
    mov	    si,	size3d3
    shl	    si,	1
    add	    ax,	buf3d3[si] ; new calculation reuses value of var_A already in ax
    mov	    (buf3d3+2)[si], ax

The code is pretty simple:

var_E += var_A;
if (var_10 == 0) {
    // 2c52
    buf3d3[size3d3+1] = buf3d3[size3d3] + var_A;
} // 2c56

…but the question is how to get the value of stack variable var_A in register ax from before the conditional jump to propagate to the addition operation within the conditional code. I tried in vain, but the compiler would reload the value of buf3d3[si] into ax, and add var_A to it.

I really don’t remember too well how I got the idea, but when all else fails, try flipping signedness. Changing the declaration of buf3d3 into extern unsigned int buf3d3[] solves this one, but don’t ask me why.

Haven’t had enough binary arithmetic magic yet?

This routine ostensibly processes the cases for view switching in the plane:

    mov	    ax,	word_330C4
    inc	    ax
    cwd
    sub	    ax,	dx
    sar	    ax,	1
    mov	    cx,	word_336E8
    sub	    cx,	ax
    dec	    cx
    and	    cx,	0Fh
    mov	    [bp+var_E],	cx

After seeing this, I got a bit nauseous, so decided it was time I had a stimulating conversation with my pal ChatGPT. I mean, obviously we’re trying to do some operation on a doubleword, but why subtract the older half from the younger? Anyway, they told me that:

sub ax, dx
    This is the interesting part.
    Since dx is either 0x0000 (for positive ax) or 0xFFFF (for negative ax), the subtraction effectively does:
        If ax was non-negative (dx = 0x0000), then ax remains unchanged.
        If ax was negative (dx = 0xFFFF), then ax = ax - (-1) = ax + 1.
            This effectively cancels the inc ax instruction for negative values.
This adjustment ensures that the rounding behavior for division by 2 is more symmetric.
Normally, integer division truncates toward zero, but this modification makes negative numbers round more correctly toward the mathematical floor.
Without sub ax, dx, a negative odd value would round incorrectly due to simple truncation.

Are they right? Who knows, right? It sounds so smart that I’m inclined to believe it. 😉 In any case, my buddy kindly made a table of values just like I enjoy, and it looks like it really is just a way to get division with sar to line up – I should really know better by now that ax:dx does not always equal doubleword, sometimes it’s just plain division. There’s really nothing interesting to the matching code:

var_E = (word_336E8 - ((word_330C4  + 1) / 2) - 1) & 0xf;

Divsion came knocking again

I don’t even know what this routine does (yet). Within, we have this function call with a conditional in the middle of the arguments getting pushed onto the stack.

    mov	    ax,	0Fh
    push    ax
    mov	    ax,	36h ; '6'
    push    ax
    mov	    ax,	0E4h ; '�'
    push    ax
    cmp	    word_380D0,	64h ; 'd'
    jnb	    short loc_192D5
    mov	    ax,	word_380D0
    jmp	    short loc_192E7
loc_192D5:
    mov	    ax,	word_380D0
    sub	    dx,	dx  ; extend into dx
    mov	    cx,	5 ; ...divide by 5
    div	    cx
    mov	    cx,	ax 
    shl	    ax,	1 
    shl	    ax,	1 ; ...multiply by 4
    add	    ax,	cx ; ...add one more 🎵
loc_192E7:
    push    ax
    call    sub_1A183

The troublesome bit is the add ax, cx. For some reason, I kept writing it as:

sub_1A183(word_380D0 < 0x64 ? word_380D0 : (word_380D0 / 5) * 4 + (word_380D0 / 5), 0xe4, 0x36, 0xf);

This does not match, the division is repeated, before adding the value. It took a while to click: multiply it by 4, add one more time makes 5!

sub_1A183(word_380D0 < 0x64 ? word_380D0 : (word_380D0 / 5) * 5, 0xe4, 0x36, 0xf);

This is as match. Looks dumb mathematically which is why I initially rejected it, but quickly remembered this is binary division, so this will strip the remainder off the value, performing rounding to a multiple of 5, essentially the equivalent of word_380D0 - (word_380D0 % 5).

Relax, that was the last one

I wanted to conclude this entry with some general discoveries. Mind that I’m not only doing the reconstruction, but investigation in IDA to properly mark variables, add declarations for automatic C header generation etc. Usually, I try not to follow the rabbit holes too deep and focus on the routine that I’m currently looking at, but sometimes a piece of information is missing, and I need to search elsewhere. At some point, I was trying to figure out the layout of some struct data, but the code in the current routine was only accessing two members out of 8, so I had to cast a wider net. This way, I finally found myself in this interesting code:

; ...
loc_1D747:
    cmp	    ax,	266Ch
    jnz	    short loc_1D74F
    jmp	    keyL_1D31D
loc_1D74F:
    jbe	    short loc_1D754
    jmp	    loc_1D7EE
loc_1D754:
    cmp	    ax,	1970h
    jnz	    short loc_1D75C
    jmp	    keyP_1D605
loc_1D75C:
    ja	    short loc_1D79E
    cmp	    ax,	1177h
    jnz	    short loc_1D766
    jmp	    keyW_1D5AF
loc_1D766:
    ja	    short loc_1D77B
    cmp	    ax,	11Bh
    jnz	    short loc_1D770
    jmp	    keyEsc_1D6B6
loc_1D770:
    cmp	    ax,	0E08h
    jnz	    short loc_1D778
    jmp	    keyBkspc_1D641
; ...

This is a long series of checks of ax against seemingly arbitrary values which was orignally probably a switch statement, but I realized these were key scan codes. This is the routine for dispatching keypresses! This is immensely helpful, because I know what the keys do in the game, so I can infer (even if broadly) the purpose of the routines that are invoked by the cases. I did some initial poking around and found a bunch of interesting avenues for further research. As it is with IDA, figuring out one bit in one place will sometimes unlock whole other areas for you, and you keep on doing that until you’re done.

Fat routines

I wanted to talk about are the sizes of the routines I’ve been encountering. Some are pretty significant. Last one I finished was almost 1600 bytes of code, which is not dramatic, but not trivial either, especially as it has multiple nested conditions and loops inside, which reminds me about how I used to need to do desperate stuff to get the control flow down. The current routine that I am supposed to do is over 4700 bytes! It’s one of the key dispatch routines I was excited about (there are actually two separate ones, not sure why yet). Anyway, this hints at maybe why why could not find many routines from Fleet Defender in the F15SE2 code - a lot of this code appears to be huge, sprawling routines that cover many aspects of the game logic. Perhaps at some point there was a refactor of this codebase into more manageable, smaller bits, which would not be found with routine signatures, and would be little help in reconstructing the code even if we could find them. Perhaps the old, ugly code was decided to be too F15-specific, so it was thrown away at some point in time of the codebase’s lifetime as cruft, and did not carry over to either F15SE3, or some other step along the way. I’ll probably never know, but this kind of makes sense and closes down the quest for duplicate searching against the F-14 code leak.

Transcription 101

By the way, how am I managing to write all this convoluted code without getting confused? Recently, while doing the reconstruction, I started following a sort-of formalized approach to transcribing the code. It’s a pretty small detail, but enough of a gamechanger for me that I wanted to mention it. Basically, while working on a routine, I will write out hex offsets of the corresponding assembly opcodes as comments in the source code. Not for every line, but it’s very helpful if I ever need to go back from an offset to a place in the C code, which pops up pretty often. Anyway, this isn’t new, I’ve been doing it since the beginning. What I started doing was writing an offset comment on every opening and closing brace, and closing the braces immediately when I encounter them:

int sub_18E50(int arg_0) {
    int var_2, var_4, var_6, var_8, var_A, var_C, var_E, var_10, var_12, var_14, var_16, var_18, var_1A;
    char var_1C;
    byte_3C5A0 = gfx_jump_2d();
    var_16 = waypoints[waypointIndex].field_0 - word_3BEC0;
    var_1A = waypoints[waypointIndex].field_2 - word_3BED0;
    // 8e83
    word_3BE92 = sub_1D008(var_16, -var_1A);
    if (word_330C2 != 0) { // 8e96
        if (word_38FEA != 0) { // 8e9d
            word_38FEA = 0;
            if (!(keyValue & 0x80)) { // 8eaa
                sub_19E44(0xd);
                sub_19E5D(0, 0, 0x13f, 0x60);
                gfx_jump_4f(0x3c);
            } // 🔵 nothing here because it's the same as below, 0x8ed2
        } // 8ed2
        byte_37C2F = 1;
        if (keyValue == 0 && byte_37C24 == 0) { // 8eeb
            if (!commData->setupUseJoy) { // 8ef9
                sub_19E44(0);
                // 🟢🟢🟢 working here
            } // 8fce                
        } // 93c4 // 🔵 have all these marked out in advance
    } // 93cf // 🔵
} // 9485 // 🔵

Having these braces fixed at both ends as soon as I encounter the opening one lets me keep my bearings even inside pretty complex control structures. When I see a location in the IDA listing, I can immediately check if it matches some of my currently open blocks’ ending, and hence that I should move out of its scope. Likewise, I have a convention for IDA, where I rename its loc_01234 labels to if/else/endif/loop/..._01234 to make the listing more readable, but that’s not as important in my view, and I only do it where it’s especially difficult to figure stuff out.

This of course assumes unoptimized code without any nastiness like deduplication or code block reordering. But luckily enough, most of the code I’ve seen so far has been compiled with the elusive /Zi flag, so the opcodes pretty much follow the C code one-to-one. But I’m sure something will surprise me one day. Oh well, even in such case following this pattern for as long as possible lets me know something is afoot when the jump sequence does not make sense.

Thanks for reading, I’ll update around the 20-30% mark or if there is anything of interest.