(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)

Recently, I’ve been on a roll, completing one routine after another. Then suddenly, after fixing the last problem in a medium-sized routine I was working on, I was presented this output from make verify, the Makefile target which builds the reconstructed executable and runs mzdiff to compare it to the original:

[...]
1000:3b87/013b87: pop bp                           == 1000:27a9/0127a9: pop bp
1000:3b88/013b88: ret                              == 1000:27aa/0127aa: ret
Reached end of routine block @ 1000:3b88/013b88
Completed comparison of routine sub_13A61, no more reachable blocks
Comparison result: match
Load module of executable is 45270/0xb0d6 bytes
--- Routine map stats (static):
Routine map of 250 routines covers 27498/0x6b6a bytes (60% of the load module)
Reachable code totals 23944/0x5d88 bytes (87% of the mapped area)
Unreachable code totals 3710/0xe7e bytes (13% of the mapped area)
Excluded 182 routines take 12035/0x2f03 bytes (43% of the mapped area)
Reachable area of excluded routines is 8535/0x2157 bytes (35% of the reachable area)
--- Comparison run stats (dynamic):
Seen 138 routines, visited 68 and compared 15409/0x3c31 bytes of opcodes inside (64% of the reachable area)
Ignored (seen but excluded) 70 routines totaling 3268/0xcc4 bytes (13% of the reachable area)
Practical coverage (visited & compared + ignored) is 18677/0x48f5 (78% of the reachable area)
Theoretical(*) coverage (visited & compared + reachable excluded area) is 23944/0x5d88 (100% of the reachable area)
(*) Any routines called only by ignored routines have not been seen and will lower the practical score,
    but theoretically if we stepped into ignored routines, we would have seen and ignored any that were excluded.

So… that’s it? We’re done (with the first executable of 3)? Well, the answer is not that simple. TL;DR: not really.

But it’s a milestone nonetheless, and to celebrate, I’ve decided to release the code that I have written up till now. So, enjoy.

Now the thing I wish to address is the completion metrics, and what they actually mean. I have a bunch of those on my tooling, so let’s discuss them starting from the top, in order to try and estimate the amount of work left.

Completion according to lst2ch.py

When I make a change in the IDA database, I have it export the listing (.lst) file, which triggers a recreation of the autogenerated C header file using the Python script lst2ch.py. That script walks over the listing and gathers a bunch of information, which it prints out before terminating:

ninja@RYZEN:f15se2-re$ make verify
mzretools/tools/lst2ch.py lst/start.lst src conf/start_rc.json --noc
Found matching include file: lst/start.inc, parsing
Writing C header file: src/start.h
Found routines: total: 249, ignored: 53, remaining: 196, ported: 68, unported: 128
Accumulated routines' size: remaining: 21638/0x5486/100%, ported: 15424/0x3c40/71.28%, unported: 6214/0x1846/28.72%
Found 986 variables, size = 31504/0x7b10
[...]

Because this tool walks the IDA listing, it has (at least initially) a more completionist view of all the routines than mzdiff, which relies on a routine map created by mzmap, but the latter is meant to be manually adjusted as the reconstruction progresses, until eventually becoming equivalent. Here, the script can see 249 routines in the listing. The ignored ones are pretty much the libc routines which were marked by IDA based on signature comparison, but the tool counts routines as ignored based on the fact that their names in the listing begin with an underscore or equal ‘start’ (the CRT0 stuff). Yeah, not perfect, but this is only supposed to be a rough estimate.

Next, it takes the remaining routines (all of them minus the ignored ones), sums up their extents from the proc keyword up to the endp, and that is our baseline 100% completion. Chunks, i.e. detached bits of code that belong to a routine but lay outside its primary extents are not covered, so again, this is not perfect but doesn’t need to be. Then it goes over the list of items in the extract list, which are address ranges marked for snipping out of the listing in the JSON configuration file, presumably because they are/will be replaced with C code, and that counts as the “ported” amount. The remainder is obvious, these are the routines which have not been ported yet, for whatever reason (more on that in a minute).

Makes sense so far? The percentages are only rough because it depends where your baseline is, and this tool sets that at the cutoff level of the ignored routines, with the disctinction being kinda flaky especially around chunks. In any case, I seem to have about 70% of the relevant code reconstructed into C at this point, now let’s try to come up with some more exact numbers.

Completion according to mzdiff

Next, the game executable gets built from the reconstructed source + the assembly containing everything not reconstructed yet which is autogenerated from the complete listing by lst2asm.py, and it gets compared to the original with mzdiff to form the output I opened this post with, so let me pull up the stats from that again for quick reference:

--- Routine map stats (static):
Routine map of 250 routines covers 27498/0x6b6a bytes (60% of the load module)
Reachable code totals 23944/0x5d88 bytes (87% of the mapped area)
Unreachable code totals 3710/0xe7e bytes (13% of the mapped area)
Excluded 182 routines take 12035/0x2f03 bytes (43% of the mapped area)
Reachable area of excluded routines is 8535/0x2157 bytes (35% of the reachable area)
--- Comparison run stats (dynamic):
Seen 138 routines, visited 68 and compared 15409/0x3c31 bytes of opcodes inside (64% of the reachable area)
Ignored (seen but excluded) 70 routines totaling 3268/0xcc4 bytes (13% of the reachable area)
Practical coverage (visited & compared + ignored) is 18677/0x48f5 (78% of the reachable area)
Theoretical(*) coverage (visited & compared + reachable excluded area) is 23944/0x5d88 (100% of the reachable area)
(*) Any routines called only by ignored routines have not been seen and will lower the practical score,
    but theoretically if we stepped into ignored routines, we would have seen and ignored any that were excluded.

The comparison tool does not base its operation on the IDA listing, but the map file that originally came from mzmap, and was tweaked manually thereafter to correspond to the layout seen by IDA. Also, mzdiff starts off from an entrypoint in both executables, and walks the opcodes in both in lockstep, comparing them and storing calls and jumps in a DFS search queue. Because it does not know the layout of the reconstructed binary (the mapfile describes the original), it does not know where an equivalent reconstructed routine resides for comparison - it has to scrape that info from watching call and jump destinations. This means that it cannot compare what it cannot reach, and it cannot reach what cannot be determined from static analysis alone. So far, I haven’t encountered a case where it failed to reach a relevant routine, but it does say that reachable code was only 87% of the area covered by the map - all the rest will need to be carefully investigated to figure out if it’s really unreachable, or just the tool can’t figure out how to get there - and develop capabilities to instruct it in that regard. But for what it’s worth, IDA also couldn’t find any xrefs to those routines (I checked), so might be the executable just contains a bunch of leftover garbage – or historical gems, depending on one’s perspective.

It also counts more routines than before (182) as excluded - in addition to the libc routines, the mapfile instructs the tool to ignore additional routines. In general, the mapfile syntax supports the following annotations on mapfile routine specifications:

  • ignore: a block of code which should not be visited during comparison with mzdiff. Direct use is typically reserved for data in the code segment, or dummy routines which do not do anything.
  • external: a routine from an external library which we do not need to reverse or reconstruct, e.g. libc. This implies ignore.
  • detached: a routine that appears unreachable. This also implies ignore.
  • complete: this routine has been completely reconstructed into identical C code, but will be visited for comparison anyway to make sure nothing was broken. The annotation is just for keeping track of completed work.
  • assembly: a routine which appears to have been written in assembly, so it will never be possible to reconstruct with C. This implies ignore by default, but can be disabled as needed.

A routine with no annotations is considered unfinished work and will be visited for comparison. The assembly routines will be ignored while work is carried out on the C reconstruction, then once the reconstruction stage is finished like now for start.exe, they will be un-ignored with a commandline argument, so I can work on making them runnable. Finally, once I start porting them into C, the ignore will need to cover them up again.

That all ignoring business means the tool only performed the comparison on 64% of the reachable area, which as you remember is in turn only 87% of the greater mapped area (which in turn is 60% of the load module, or the EXE file image). The good news is that the tool seems to have accounted for 100% of the reachable area, either directly through the comparison, or because they were excluded for whatever reason (libc, unreachable, or ignored assembly routine).

OK, this is giving me a headache. Just how much stuff is there actually left to do?

Completion according to mzmap

If you have two sets of statistics which don’t make your view of a problem any clearer, surely the answer is to implement a third set of statistics in another tool, isn’t it? I wish I was joking.

The problem was that mzdiff is really not designed to gather and present this information – its goal is to compare instructions. Also, until a few days ago, I did not have the routine annotations external/detached/assembly implemented, so all the ignored routines were all thrown into one bag, making it difficult to see which ignored routines were part of libc and really didn’t need to get touched, which ones were part of the game but seemed unreachable so there didn’t seem to be a point reconstructing them, and which were not included in the comparison because of being written in assembly, but would need porting work at some point.

So, in addition to implementing the additional annotations and fixing some bugs, I ended up extending mzmap to also print out the contents of the mapfile in detail and went over the IDA listing to update the contents of the mapfile: figure out what routines were missing from the map, which ones could be considered detached, that sort of thing. The result is as follows:

ninja@RYZEN:f15se2-re$ mzmap map/start.map --brief
Showing only only uncompleted routines and unclaimed blocks in code segments
--- Routine map containing 250 routines
Size 45270/0xb0d6
Code1 CODE 0000
Code2 CODE 06a5
Data1 DATA 06b5
Stack1 STACK 0e66
--- Summary:
Code size: 27316/0x6ab4 (60% of load module)
  Completed: 15463/0x3c67 (68 routines, 56% of code) - 1:1 rewritten to high level language
  Uncompleted: 0/0x0 (0 routines, 0% of code) - routines not yet rewritten which can be
  Assembly: 2842/0xb1a (40 routines, 10% of code) - impossible to rewrite 1:1
  Ignored: 8978/0x2312 (99 routines, 32% of code) - excluded from comparison
    External: 5288/0x14a8 (56 routines, 58% of ignored) - e.g. libc library code
    Other: 3690/0xe6a (43 routines, 41% of ignored) - code ignored for other reasons
      Reachable: 249/0xf9 (9 routines, 6% of other) - code which has callers
      Unreachable: 3441/0xd71 (34 routines, 93% of other) - code which appears unreachable
  Unclaimed: 33/0x21 (33 blocks, 0% of code) - holes between routines not covered by map
  Unaccounted: 0/0x0 (0 routines) - consistency check, should be zero
Data size: 17954/0x4622 (39% of load module)
  Routines in data segment: 215/0xd7, 43 routines

I’m telling it to be --brief to hide the non-problematic items in the map, the idea here is to only show areas which are uncompleted or unclaimed (refer to the explanations in the output). Based on this, my stats are not so rosy; I only have 56% of the code rewritten into C, but remember this is in relation to the entire code segment size, which contains libc and some apparently unreachable code. The important bit at this stage is the “Assembly” stat, which tells me I have about 2.8kB of code in 40 assembly routines to worry about. Unless some of the unreachable code (of which there also seems to be about 3kB) turns out to be actually reachable - but I’ll worry about it if and when.

Seeing as transcribing 15kB of C code apparently took me 2 years could be interpreted that I need 4,5 months for the assembly, but this is really comparing apples to oranges because for better or worse, with the assembly I don’t need to do any guesswork, and a lot of those 2 years was spent learning about how the compiler operates, hunting for the right build flags and developing the tooling.

So, now what?

With the C reconstruction work done for start.exe, the work now switches gears into assembly tweaking and porting. The first order of business is to get the game runnable. I need to go over the 40 routines, replace any hardcoded offsets with data symbol references, build the executable and make it work inside the game. Actually, let’s flip ignoring the assembly routines off for a quick test in mzdiff:

ninja@RYZEN:f15se2-re$ make verify
mzretools/debug/mzdiff bin/start.exe:0x10 build/start.exe:[558bec83ec1c56c706] --verbose --loose --ctx 30 --map map/start.map --asm
[...]
--- Now @1000:2bba/012bba, routine 1000:2bba-1000:2c58/00009f: clearRect [near] [assembly], block 012bba-012c58/00009f, target @1000:3ec0/013ec0
1000:2bba/012bba: push bp                          != 1000:3ec0/013ec0: ret
ERROR: Instruction mismatch in routine clearRect at 1000:2bba/012bba: push bp != 1000:3ec0/013ec0: ret
--- Context information for up to 30 additional instructions after mismatch location:
1000:2bbb/012bbb: mov bp, sp                       != 1000:3ec1/013ec1: ret
1000:2bbd/012bbd: push di                          != 1000:3ec2/013ec2: ret
[...]
Load module of executable is 45270/0xb0d6 bytes
--- Routine map stats (static):
Routine map of 250 routines covers 27498/0x6b6a bytes (60% of the load module)
Reachable code totals 23944/0x5d88 bytes (87% of the mapped area)
Unreachable code totals 3710/0xe7e bytes (13% of the mapped area)
Excluded 142 routines take 9193/0x23e9 bytes (33% of the mapped area)
Reachable area of excluded routines is 5724/0x165c bytes (23% of the reachable area)
--- Comparison run stats (dynamic):
Seen 42 routines, visited 24 and compared 4252/0x109c bytes of opcodes inside (17% of the reachable area)
Ignored (seen but excluded) 18 routines totaling 153/0x99 bytes (0% of the reachable area)
Practical coverage (visited & compared + ignored) is 4405/0x1135 (18% of the reachable area)
Theoretical(*) coverage (visited & compared + reachable excluded area) is 9976/0x26f8 (41% of the reachable area)
Missed (not seen and not excluded) 84 routines totaling 13878/0x3636 bytes (50% of the covered area)
(*) Any routines called only by ignored routines have not been seen and will lower the practical score,
    but theoretically if we stepped into ignored routines, we would have seen and ignored any that were excluded.

As expected, the comparison failed on the first assembly routine (clearRect) that it entered because my tooling replaces all routines remaining in the assembly with ret stubs. Also, the statistics took a dive because we stopped at around 4kB into the comparison. But once the assembly routines are done (and it should be an easier task than guessing the C code to emit identical assembly), then we will jump back to 100% coverage again.

At that point I will have code for a binary that’s “free” - I will be able to add to it, and it should be tolerant of changes in internal layout. Then, all these assembly routines that can, will be rewritten into C to make the game more easily portable outside MS-DOS in the future. Doing so will make them completely divergent and non-comparable, so I will need to figure out some way to make sure they operate correctly, potentially with unit testing. Only then will I be free to move on to EGAME.EXE and do the whole thing all over again. And that doesn’t even cover the fact that the source code is still obscure and will need research to fully document all the routines and data which for now have placeholder names, since I have no idea what they are doing, even though I might have transcribed some of them into C.

But then again, I am happy to have all my ducks in a row with the mapfile aligned, the stats confirming I haven’t missed any relevant routines, and the current goal of the ~3kB of assembly routines clear in sight.