Chasing far calls
(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)
I recently got the mzdiff tool in mzretools to a point where I can give it one of the game executables and my C reimplementation executable, and it will compare them instruction-by-instruction (allowing for differences in file layout, particularly data and code addresses), so I sat down and started implementing the main()
function of START.EXE
, the first actual game exe which shows the title screen, lets you select a pilot, a mission, difficulty level, before showing you the targets and exiting (the actual flight is handled by EGAME.EXE
). So I’m reviewing the disassembly in IDA and writing my C code based off it, got some instructions to match, which mzdiff helpfully shows, so I’m psyched. Then I hit this instruction:
On the surface, nothing too terrible, it’s a far call (a “long” address outside of the current segment). IDA lets me click the reference to see where it leads:
Whoops, we’re executing a part of the data segment as code! Nowadays the OS will not let you do that, we have all kinds of DEP mechanisms to prevent data execution, but this is DOS, so we don’t care. Actually, it makes a lot of sense to have these “trampoline” procedures in the data segment. I did not describe the game’s usage of overlays in detail yet, but it contains several .EXE files containing drivers for the various types of graphics adapters that the IBM PC could use in those days (CGA, EGA, MCGA), same for sound cards and a single “varia” library of small utility functions - MISC.EXE
. At runtime, those .EXEs are loaded into memory with DOS system function 21.4B by F15.COM
with the AL=03
(overlay) mode, which means the file is loaded into memory, but not executed. Each of those contains a header which says (among others) how many functions it has, and then a list of offsets into those functions (different drivers have functions at different offsets). So, every game exe will look at the header and set up a jump table like the one above by patching the far jump instructions’ segment to the overlay load segment, and the subsequent offsets to those found in the overlay’s header list. Then, a game exe can just say “call graphics function N”, which will be a far call into the patched data segment, which will do a far jump into the overlay code. Then the overlay function is responsible for returning to the original call location directly. So, we need this jump table data to both function as code (because we need to call it to jump into overlays), and data (because we need to patch in addresses that are only known at run time).
So the question is now, how to write C code that will emit a far call into the data segment? My first approach was something like this:
Unfortunately, this ends up generating this assembly:
I tried tweaking things, but could not get it to work. Then somebody on the retrocomputing Stack Exchange noticed that putting a far function into a separate module (different .c file) and calling it from the main one will result in a far call just like I need under memory models which allow for more than one code segment (medium, large or huge memory model - actually the MS C 5.0 compiler docs say every module will get its own code segment).
This is not exactly what I need, but it’s a hint on how the linker works; it will find addresses for known functions in other modules, and if it needs to put them in a different segment (due to the memory model being used), it will generate a far call with the segment address. So what if I put a byte array in a different module and cast it to a far function pointer?
This again generates the code where a temporary pointer is created on the stack, DS
and the offset are pushed and a call goes through that, so I’m screwed.
At this point, somebody else on Stack Exchange suggested an Evil Genius technique:
Despite being completely evil due to the fact that the declaration in the header file is a deliberate mismatch with the actual definition in the .c file, this does generate the exact code that I want. So, victory? Not quite. The graphics driver has 84 functions, the misc has 6, and the sound has 10, and the jump table seems to have some spare room left, so there appear to be 110 slots, and I need to create 110 functions. Let’s just start with three:
This would be tedious, but I could make it work with some preprocessor mischief thrown in. Unfortunately, the linker decided to put some padding between the individual arrays, so there were null bytes following them. It’s weird because the compiler docs say arrays of char are byte aligned, but I was there, I saw it. I could account for the extra byte elsewhere, but what if some day when the exe grows bigger, the compiler will decide to drop the padding? This is not sustainable. Time to get creative.
Surprisingly enough, when looking at the disassembly, all the 3 far calls that result from this are to the same far address, which is the address of overlay_jumptable
. Did I make a mistake in the casting macro?
This yielded an output like below:
slot1 = 0123:00a4
slot2 = 0123:00a9
slot2 = 0123:00ae
The addresses are 5 bytes apart like expected when calculated, but when actually called, the linker will ignore the shifted value, probably because there is not an exported known symbol at that address. I don’t know why it uses the “next best thing” by using the address of the beginning of the array, but we’re dancing fandango on the precipice of a cliff here, poking undefined behaviour of the compiler with a stick. But again, this is DOS, an OS for Real Programmers who aren’t afraid of stuff like that.
This means however that I exhausted the possibilities that C in general and the compiler in particular let me accomplish. For getting the exact memory layout that I want, I’m going to need to go into assembly. I was considering it as soon as I hit that first roadblock, because that data-is-code idiom smells of somebody having coded a part of it in assembly, but I was hoping it could be accomplished from C, and abusing the compiler was really fun while it lasted. But now I need to take out the big boys’ toys. I extended my DOS build script to support MASM 5.10, and made this:
Then I just need a header file to call these from C:
Well, this does not work. The linker made ‘trampoline’ into a code segment of its own, probably because when it sees proc
, it assumes - CODE!. This is getting ridiculous. If you won’t play nice, neither will I.
Finally, when linked with the rest of the program, this does what I wanted in the very beginning. I just spelled out a bunch of byte values as data in the assembly, exposed them as public symbols, and lied to the compiler that they are really functions. Using assembler lets me control the exact layout so there will not be any alignment padding, and I can have my cookie and eat it, too.
So everything is well and good, time to move on, right? Right? Well of course not! The far call is there, but I am missing the subsequent add sp, 0x4
instruction which takes the arguments off the stack after the far call is done. This is the C code:
…and here is the comparison of the generated assembly (on the right) with the original game exe’s instructions (on the left). I am using my tool (mzdiff) from mzretools to compare the binaries, the ‘==’ mark means an exact instruction match, ‘~=’ or ‘=~’ mean an inexact match with a difference in either the left or the right operand, and ‘!=’ is of course a mismatch which interrupts the tool, telling me that I need to fix a mistake.
There are different function calling conventions, which stipulate who (caller or callee) is supposed to clean up the stack, and the MS C functions use the cdecl
conventions which mean the caller is supposed to clean up, but the compiler also supports the pascal
and fortran
conventions. Maybe I need to specify cdecl
explicitly on the function prototype?
It took an evening of deep thought, but finally I figured it out. Bottom line, the declaration in the header file does not matter. I tried adding the argument types explicitly, removed the return type, tried switching the order of the far
and cdecl
specifiers. It turns out that the compiler is just lazy, and it will not emit an instruction to do stack cleanup if that’s the last call in the file - why bother? So it’s just a matter of adding a dummy call to another function in the C source for the comparison to pass, to be removed later.
And now it finally fits:
This is just some work that needed to be done to reimplement a couple of instructions of the game in C. And in the end it even turned out that C was not enough, but I’m okay with a small snippet of assembly that just contains a block of data to be linked with the rest of the recreation code. Actually it’s not that bad, because there will be many places in the code where overlays will get called, and doing this research up front lets me cover all those easily. I just thought it was interesting to show how much work goes into figuring minute stuff like this out, before transcribing the game disassembly into C becomes an almost mechanical task. I’m not at that point yet, but I’m excited for what will come next.
An aspect of doing in-depth analysis of the code in this way, that I did not expect, is that one can’t help but imagine what went through the head of the person who wrote this code over 30 years ago. I’m betting some of the same thoughts went through their head. Should I copy-paste these functions, or should I make a macro? How best to reuse the code that will be shared between the multiple binaries? What kind of compiler and linker options do I use? It is kind of like going back in time and being there with the programmer as they are figuring out how to make this game work, and is part of why this project fascinates me so much.
You are an interesting man to not know, Mr. Hollis.