What does it take to take an old game apart? (Part 1)

This is a one-off, lengthy post that I decided to write up to summarize the know-how necessary to take a project of this kind, and the tools and methods that can be employed in order to get it done.

A lot of this stuff is probably going to sound basic to anybody who has an idea of what reverse engineering entails, but I’m writing this for the benefit of anyone who might want to pick up a similar project, as well as to organize my own thoughts. Parts of it might be vague or philosophical, so just bear with me or ignore this rant.

A game is made of assets, like models, textures and world maps, and the code that makes use of those assets and implements additional game logic to create a dynamic environment that responds to the player’s inputs.

Code

The game code is implemented in a programming language of the developer’s choice. Back in the time F15 SE2 was made, that language would often be pure assembly for the particular platform the game was developed for (which would prove a hindrance if you wanted to port your game to different platforms). Part of the reason was, the hardware was quite limited in its capabiliies, and to make best use of it, or perhaps even be able to have to your game run with acceptable performance in the first place, you needed to get close to the metal and directly manipulate the hardware on the lowest level - no fancy pre-made engines or libraries were available, you had to roll your own.

Slowly, compilers for high level languages like C and Pascal were starting to appear on personal computers, but their capabilities were severely limited compared to the level of optimizations modern compilers are capable of - today your code will be broken up into pieces, reordered, scattered all over the compiled binary in a form beyond recognition. Back in the late 80s, what your wrote in a high level language was translated mostly directly into sequences of machine code, and the code generated often kinda sucked, so that’s another reason people chose to go with writing assembly by hand in the first place.

After a program written in a high level language gets compiled into executable form, information is lost. The functions that the programmer named so carefully to have their purpose clearly documented are just a sequence of binary-encoded CPU instructions without a name, surrounded by other sequences of instructions belonging to other functions. Variables in the code that had nice names to say what purpose they serve, now become nameless sequences of binary data in the game’s executable, with no clear boundaries to show where they were supposed to start or end. I’ve heard it described that reverse engineering compiled binary code is like trying to extract the eggs, flour and sugar after the cake’s been baked, and I think that’s a good analogy. You need to disassemble the binary game code into assembler text, and study that text to try and understand what it is trying to accomplish, then write code that does the same thing in the same language and operating system that the original was implemented on, or on a different platform altogether.

Data

The assets will nowadays usually be produced by some powerhouse commercial software for 3d modelling and effects, texture design, audio engineering etc., with the result being a binary blob in some relatively common format (JPG/PNG/MP3/whatever) Back then, such software was either non-existent, was prohibitively expensive, or had pitiful functionality and/or performance, which is why again, people working in those game companies wrote their own sprite, 3d model or map design software, and the data formats were proprietary and often specific to that particular company.

It is obviously necessary to understand the formats used by a game when trying to reimplement it, perhaps even more so than understanding the code itself - after all, if you knew how the formats work, you could go about implementing an alternative implementation, without ever taking a look at the code. You would not have preserved the identical behaviour of the original, but all the content would be there, and it’s a matter of what you perceive what your goal was at the beginning, to determine if that’s good enough or not. See, I told you this would get philosophical.

So, reversing based on figuring out the data formats alone is possible, but I don’t posess a code breaker mentality, and I have hardly any information on the data to go on, which means that (for me at least), the only real way forward of figuring out how a format like that works, is to infer it from the way the code interacts with the files containing it, and how it uses that data during gameplay.

Operating system

DOS is a relatively simple OS, which provides the barest of abstractions over the computer hardware. You are free to bypass DOS at any time (actually, it’s usually required if you want decent performance), unlike a modern OS you can write to any area of memory (no memory protection, normally we are running in real mode), including memory occupied by DOS itself, you can (and must) interact with hardware behind DOS’s back and do all kinds of wild stuff in order to achieve basic functionality, like putting graphics on screen.

In fact, pretty much the only comprehensive service DOS provides is managing the filesystem on the disk for you, so you can open files and directories without reading raw sectors off the disk. Well, at least it’s something. Yeah, I know, there’s time and date, DOS will load your program and set it up and make sure it can run, it can also do some other stuff, but not much that’s useful in game programming. In particular, there is no universal framework or drivers for display or audio hardware, which is why games from that period had to implement support for those on their own, and most games supported multiple variants of hardware (especially soundcards after everybody settled on SVGA and VESA for high-performance display purposes. There’s no multithreading out of the box, actually there’s not even a capability to run more than one process at once (technically there are TSRs, but they are not a part of what we might consider a multitasking environment).

Still, an understanding of DOS services and internals is crucial to understanding what game code running under it is trying to do.

(continues in Part 2)