(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)

I recently got the mzdiff tool in mzretools to a point where I can give it one of the game executables and my C reimplementation executable, and it will compare them instruction-by-instruction (allowing for differences in file layout, particularly data and code addresses), so I sat down and started implementing the main() function of START.EXE, the first actual game exe which shows the title screen, lets you select a pilot, a mission, difficulty level, before showing you the targets and exiting (the actual flight is handled by EGAME.EXE). So I’m reviewing the disassembly in IDA and writing my C code based off it, got some instructions to match, which mzdiff helpfully shows, so I’m psyched. Then I hit this instruction:

call far 16b5:0c2f

On the surface, nothing too terrible, it’s a far call (a “long” address outside of the current segment). IDA lets me click the reference to see where it leads:

dataSeg:0C2F sub_0C2F proc far
dataSeg:0C2F                 jmp     far ptr loc_5C30
dataSeg:0C2F sub_0C2F endp
dataSeg:0C34 sub_0C34 proc far
dataSeg:0C34                 jmp     far ptr loc_5C35
dataSeg:0C34 sub_0C34 endp
dataSeg:0C39 sub_0C39 proc far
dataSeg:0C39                 jmp     far ptr loc_5C3A
dataSeg:0C39 sub_0C39 endp

Whoops, we’re executing a part of the data segment as code! Nowadays the OS will not let you do that, we have all kinds of DEP mechanisms to prevent data execution, but this is DOS, so we don’t care. Actually, it makes a lot of sense to have these “trampoline” procedures in the data segment. I did not describe the game’s usage of overlays in detail yet, but it contains several .EXE files containing drivers for the various types of graphics adapters that the IBM PC could use in those days (CGA, EGA, MCGA), same for sound cards and a single “varia” library of small utility functions - MISC.EXE. At runtime, those .EXEs are loaded into memory with DOS system function 21.4B by F15.COM with the AL=03 (overlay) mode, which means the file is loaded into memory, but not executed. Each of those contains a header which says (among others) how many functions it has, and then a list of offsets into those functions (different drivers have functions at different offsets). So, every game exe will look at the header and set up a jump table like the one above by patching the far jump instructions’ segment to the overlay load segment, and the subsequent offsets to those found in the overlay’s header list. Then, a game exe can just say “call graphics function N”, which will be a far call into the patched data segment, which will do a far jump into the overlay code. Then the overlay function is responsible for returning to the original call location directly. So, we need this jump table data to both function as code (because we need to call it to jump into overlays), and data (because we need to patch in addresses that are only known at run time).

So the question is now, how to write C code that will emit a far call into the data segment? My first approach was something like this:

// 0xea is the 'jmp far' opcode, then 4 placeholder bytes for the offset and segment
uint8 test[] = { 0xea, 0x12, 0x34, 0x56, 0x78 }; 
typedef void(far *func)(void);
int main(void) {
    // cast data segment address to a function pointer and call
    ((func)test)();
}

Unfortunately, this ends up generating this assembly:

mov ax,0x42 ; the offset of 'test' in the data segment
mov dx,ds
mov [bp-0x10],dx ; put segment and offset on the stack
mov [bp-0x12],ax
call far [bp-0x12] ; call through temporary pointer placed on the stack

I tried tweaking things, but could not get it to work. Then somebody on the retrocomputing Stack Exchange noticed that putting a far function into a separate module (different .c file) and calling it from the main one will result in a far call just like I need under memory models which allow for more than one code segment (medium, large or huge memory model - actually the MS C 5.0 compiler docs say every module will get its own code segment).

// test.h
far void function();

// test.c
far void function() {
}

// main.c
#include "test.h"

int main()  {
    function(); // will generate a far call with an immediate segment:offset address in the instruction
    return 0;
}

This is not exactly what I need, but it’s a hint on how the linker works; it will find addresses for known functions in other modules, and if it needs to put them in a different segment (due to the memory model being used), it will generate a far call with the segment address. So what if I put a byte array in a different module and cast it to a far function pointer?

// test.h
// utility typedef to simplify the casting
typedef void (far * function)();
extern far uint8 data[];

// test.c
far uint8 data[] = { 0xea, 0x12, 0x34, 0x56, 0x78 }; 

// main.c
#include "test.h"

int main()  {
    // cast the address of the far array into a far function pointer and call it
    ((function)data)();
    return 0;
}

This again generates the code where a temporary pointer is created on the stack, DS and the offset are pushed and a call goes through that, so I’m screwed.

At this point, somebody else on Stack Exchange suggested an Evil Genius technique:

// test.h
// 'function' is a function, I promise!
void far function();

// test.c
// mwahaha, JK, it's really data
far uint8 function[] = { 0xea, 0x12, 0x34, 0x56, 0x78 }; 

// main.c
#include "test.h"

int main()  {
    function(); // call the data directly with no cast
    return 0;
}

Despite being completely evil due to the fact that the declaration in the header file is a deliberate mismatch with the actual definition in the .c file, this does generate the exact code that I want. So, victory? Not quite. The graphics driver has 84 functions, the misc has 6, and the sound has 10, and the jump table seems to have some spare room left, so there appear to be 110 slots, and I need to create 110 functions. Let’s just start with three:

// test.h
void far function1();
void far function2();
void far function3();

// test.c
far uint8 function1[] = { 0xea, 0x12, 0x34, 0x56, 0x78 }; 
far uint8 function2[] = { 0xea, 0x12, 0x34, 0x56, 0x78 }; 
far uint8 function3[] = { 0xea, 0x12, 0x34, 0x56, 0x78 }; 

// main.c
#include "test.h"

int main()  {
    function1();
    function2();
    function3();
    return 0;
}

This would be tedious, but I could make it work with some preprocessor mischief thrown in. Unfortunately, the linker decided to put some padding between the individual arrays, so there were null bytes following them. It’s weird because the compiler docs say arrays of char are byte aligned, but I was there, I saw it. I could account for the extra byte elsewhere, but what if some day when the exe grows bigger, the compiler will decide to drop the padding? This is not sustainable. Time to get creative.

// test.h
#define JMPTABLE_SLOT_SIZE 5 // 0xea far jump opcode + 4 bytes for far address
#define JMPTABLE_SLOT_COUNT 110
typedef void (far* OverlaySlot)();

// the mismatched declaration lying about the jumptable being a function
void FAR overlay_jumptable();

// a macro to cast the jumptable address into a pointer to a byte so that I can do pointer arithmetic on it;
// calculate the position of the location of a particular function and cast it back to a far function pointer
#define OVERLAY_FUNC(index) ((OverlaySlot)(((uint8 far*)overlay_jumptable) + (index * JMPTABLE_SLOT_SIZE)))

// test.c
// just put the jumptable data all together in one linear array to avoid padding being added by the compiler
far uint8 overlay_jumptable = { 
    0xea, 0x12, 0x34, 0x56, 0x78,
    0xea, 0x12, 0x34, 0x56, 0x78,
    0xea, 0x12, 0x34, 0x56, 0x78,
    // ... 110 times
};

// main.c
#include "test.h"

int main()  {
    OVERLAY_FUNC(1)();
    OVERLAY_FUNC(2)();
    OVERLAY_FUNC(3)();
    return 0;
}

Surprisingly enough, when looking at the disassembly, all the 3 far calls that result from this are to the same far address, which is the address of overlay_jumptable. Did I make a mistake in the casting macro?

// main.c
#include "test.h"

int main()  {
    // print out the value of the pointers to the functions
    printf("slot1 = %p\n", OVERLAY_FUNC(1));
    printf("slot1 = %p\n", OVERLAY_FUNC(2));
    printf("slot1 = %p\n", OVERLAY_FUNC(3));
    return 0;
}

This yielded an output like below:

slot1 = 0123:00a4
slot2 = 0123:00a9
slot2 = 0123:00ae

The addresses are 5 bytes apart like expected when calculated, but when actually called, the linker will ignore the shifted value, probably because there is not an exported known symbol at that address. I don’t know why it uses the “next best thing” by using the address of the beginning of the array, but we’re dancing fandango on the precipice of a cliff here, poking undefined behaviour of the compiler with a stick. But again, this is DOS, an OS for Real Programmers who aren’t afraid of stuff like that.

This means however that I exhausted the possibilities that C in general and the compiler in particular let me accomplish. For getting the exact memory layout that I want, I’m going to need to go into assembly. I was considering it as soon as I hit that first roadblock, because that data-is-code idiom smells of somebody having coded a part of it in assembly, but I was hoping it could be accomplished from C, and abusing the compiler was really fun while it lasted. But now I need to take out the big boys’ toys. I extended my DOS build script to support MASM 5.10, and made this:

; the underscores are an interesting feature of the MS C compiler which will add them
; to all global symbols under the hood, so these would not have been found at link time
; without an underscore, despite being known by their _-less names in C code
PUBLIC _function1, _function2, _function3
DOSSEG
.MODEL SMALL

; define a segment beginning at memory address 0 (aka IVT, interrupt vector table).
; the assembler will not let me pass an immediate address
; into the far jump instruction, as in 'jmp far 0:0' (error A2038: Left operand must have segment)
ivt segment at 0h
org 0
begin label far
ivt ends

; MASM supports "simplified segments", where you just put .DATA on your data segment,
; but I need to be specific here to make sure this will be a part of the data segment
trampoline segment public 'DATA'

_function1 proc far
    jmp ivt:begin
_function1 endp

_function2 proc far
    jmp ivt:begin
_function2 endp

_function3 proc far
    jmp ivt:begin
_function3 endp

trampoline ends

; I REALLY want the linker to merge my 'trampoline' segment with other data segments when putting together the final executable
DGROUP group trampoline
    ASSUME DS:DGROUP

END

Then I just need a header file to call these from C:

// test.h
// remember, the underscore is implicit
void function1();
void function2();
void function3();

// main.c
int main() {
    function1();
    function2();
    function3();
    return 0;
}

Well, this does not work. The linker made ‘trampoline’ into a code segment of its own, probably because when it sees proc, it assumes - CODE!. This is getting ridiculous. If you won’t play nice, neither will I.

PUBLIC _function1, _function2, _function3
DOSSEG
.MODEL SMALL

; didn't even need a special segment definition this time, just go with the defaults.
.DATA
_function1 db 0EAh
    dw 0h, 0h
_function2 db 0EAh
    dw 0h, 0h
_function3 db 0EAh
    dw 0h, 0h    
END

Finally, when linked with the rest of the program, this does what I wanted in the very beginning. I just spelled out a bunch of byte values as data in the assembly, exposed them as public symbols, and lied to the compiler that they are really functions. Using assembler lets me control the exact layout so there will not be any alignment padding, and I can have my cookie and eat it, too.

So everything is well and good, time to move on, right? Right? Well of course not! The far call is there, but I am missing the subsequent add sp, 0x4 instruction which takes the arguments off the stack after the far call is done. This is the C code:

// overlay.h
int far gfx_overlay_4b(); 

// start.c
uint8 hercFlag; // hercules mode
int main() {
    // [...]
    // the far call to the overlay that does not generate the stack cleanup
    gfx_overlay_4b(*FARADDR_PTR_OFF(uint16, commAddr, COMM_GFXINIT_RESULT_OFFSET), 2);
    // next line is a load from memory into a global variable
    hercFlag = *FARADDR_PTR_OFF(uint8, commAddr, COMM_SETUP_MONOCHROME_OFFSET);

    // the rest of the recreation of main() is not done yet...

    return 0;
}

…and here is the comparison of the generated assembly (on the right) with the original game exe’s instructions (on the left). I am using my tool (mzdiff) from mzretools to compare the binaries, the ‘==’ mark means an exact instruction match, ‘~=’ or ‘=~’ mean an inexact match with a difference in either the left or the right operand, and ‘!=’ is of course a mismatch which interrupts the tool, telling me that I need to fix a mistake.

ninja@dell:eaglestrike$ make verify
cl /c /Gs  /Foe:\start.obj f15-se2\start.c
link /M  start.obj slot.obj,d:\start.exe,,,
../mzretools/debug/mzdiff ida/start.exe:10 build-f15-se2/start.exe:10 --verbose --loose --sdiff 1 --map map/start.map
Comparing code between reference (entrypoint 1000:0010/010010) and other (entrypoint 1000:0010/010010) executables
Reference location @1000:0010/010010, routine 1000:0010-1000:0482/000473: main, block 010010-010482/000473, other @1000:0010/010010
1000:0010/010010: push bp                          == 1000:0010/010010: push bp
1000:0011/010011: mov bp, sp                       == 1000:0011/010011: mov bp, sp
1000:0013/010013: sub sp, 0x1c                     =~ 1000:0013/010013: sub sp, 0xe
[...]
1000:0081/010081: mov ax, 0x2                      == 1000:0080/010080: mov ax, 0x2 
1000:0084/010084: push ax                          == 1000:0083/010083: push ax 🟢 second function argument pushed onto stack
1000:0085/010085: les bx, [0x77f2]                 =~ 1000:0084/010084: les bx, [0x3fc] 
1000:0089/010089: push es:[bx+0x20]                == 1000:0088/010088: push es:[bx+0x20] 🟢 likewise for first argument
1000:008d/01008d: call far 0x16b50c2f              ~= 1000:008c/01008c: call far 0x105b01b9 🟢 far call into the data segment
1000:0092/010092: add sp, 0x4                      != 1000:0091/010091: les bx, [0x3fc] 🔴 no stack cleanup! wtf?
ERROR: Instruction mismatch in routine main at 1000:0092/010092: add sp, 0x4 != 1000:0091/010091: les bx, [0x3fc]
make: *** [Makefile:66: verify-start] Error 1

There are different function calling conventions, which stipulate who (caller or callee) is supposed to clean up the stack, and the MS C functions use the cdecl conventions which mean the caller is supposed to clean up, but the compiler also supports the pascal and fortran conventions. Maybe I need to specify cdecl explicitly on the function prototype?

// overlay.h
int far cdecl gfx_overlay_4b();

It took an evening of deep thought, but finally I figured it out. Bottom line, the declaration in the header file does not matter. I tried adding the argument types explicitly, removed the return type, tried switching the order of the far and cdecl specifiers. It turns out that the compiler is just lazy, and it will not emit an instruction to do stack cleanup if that’s the last call in the file - why bother? So it’s just a matter of adding a dummy call to another function in the C source for the comparison to pass, to be removed later.

int main() {
    // [...]
    gfx_overlay_4b(*FARADDR_PTR_OFF(uint16, commAddr, COMM_GFXINIT_RESULT_OFFSET), 2);
    hercFlag = *FARADDR_PTR_OFF(uint8, commAddr, COMM_SETUP_MONOCHROME_OFFSET);

    // XXX: force emit of stack cleanup for last function, 
    // remove after main() recreation is complete
    dummy_function();

    return 0;
}

And now it finally fits:

ninja@dell:eaglestrike$ make verify
[...]
1000:0081/010081: mov ax, 0x2                      == 1000:0081/010081: mov ax, 0x2
1000:0084/010084: push ax                          == 1000:0084/010084: push ax
1000:0085/010085: les bx, [0x77f2]                 =~ 1000:0085/010085: les bx, [0x3fc]
1000:0089/010089: push es:[bx+0x20]                == 1000:0089/010089: push es:[bx+0x20]
1000:008d/01008d: call far 0x16b50c2f              ~= 1000:008d/01008d: call far 0x105c01b9
1000:0092/010092: add sp, 0x4                      == 1000:0092/010092: add sp, 0x4 🟢 hooray!

This is just some work that needed to be done to reimplement a couple of instructions of the game in C. And in the end it even turned out that C was not enough, but I’m okay with a small snippet of assembly that just contains a block of data to be linked with the rest of the recreation code. Actually it’s not that bad, because there will be many places in the code where overlays will get called, and doing this research up front lets me cover all those easily. I just thought it was interesting to show how much work goes into figuring minute stuff like this out, before transcribing the game disassembly into C becomes an almost mechanical task. I’m not at that point yet, but I’m excited for what will come next.

An aspect of doing in-depth analysis of the code in this way, that I did not expect, is that one can’t help but imagine what went through the head of the person who wrote this code over 30 years ago. I’m betting some of the same thoughts went through their head. Should I copy-paste these functions, or should I make a macro? How best to reuse the code that will be shared between the multiple binaries? What kind of compiler and linker options do I use? It is kind of like going back in time and being there with the programmer as they are figuring out how to make this game work, and is part of why this project fascinates me so much.

You are an interesting man to not know, Mr. Hollis.