The thing won't START
(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)
As mentioned before, the state of the project right now is that although the first executable (START.EXE
) has been successfully reconstructed, it unfortunately freezes upon startup. I decided to write up the situation and the steps I tried to resolve the problem, hoping maybe putting it into ordered words might give me an idea on how to progress.
Doing it the hard way
First thing I tried was looking in the DosBox debugger to try and figure out where the freeze occurs. This is only a little bit easier than doing the same for the original executable, as DosBox does not show me symbol names, and I need to switch back and forth between it, the disassembly and the code editor. I also discovered the breakpoints are not very reliable. When I place a breakpoint on a location, it will fire the first time around, but when I continue, it was not firing the second time, leading me to believe control was not returning there, instead going into some invalid code and freezing.
This leads us to the sad fact that debugging real mode DOS applications is a pain. In Linux, I would probably get a SIGSEGV
that I could catch in the debugger and solve months ago. Here, if something goes wrong, the code will happily jump into the weeds and roll around in there indefinitely.
In any case, once the freeze was observed, breaking manually (Alt-Break, or Alt-Fn-B for Logitech keyboards missing a Break key) did not lead to any enlightenment. Usually, I would find myself in a weird CS=F000
location, seemingly containing some internal DosBox code for handling keyboard input (not sure if it has to do with my pressing Alt-Break, which would be stupid). Bottom line, I can’t figure out where it freezes from just looking in the debugger.
I recall there being some bugs with breakpoints in vanilla DosBox (which is pretty much unmaintained at this point), so it would be a good idea to try dosbox-staging instead - will try later.
Why not use the right tool for the job?
But in any case, I shouldn’t be having so much trouble. I have the source code for this binary, after all, so I should be able to comfortably debug with symbols. The executable cannot be launched directly without substantial work done by the loader and setup (F15.COM
and SU.EXE
in the original game), but luckily I have the loader reimplemented already, so I tweaked it a bit to run START.EXE
under CodeView. That seems to have been the approach taken by Microprose as well, as I found a path to the CodeView executable in F15.COM
, though there’s no more code that uses it.
Unfortunately, CodeView does not run well under DosBox. Well, it barely runs at all, and switching in and out of the debugger from graphical mode seems to completely corrupt the display. This is really my fault as the DosBox readme clearly says it’s not intended for running any other software than games. I really should be whipped for running the compiler under it, but them’s the breaks.
I will keep trying to run CodeView under VirtualBox, 86box and PCem. But I’ve run into problems trying to mount hard disk images (vdi/vhd/img/…) from those to upload my executables for debugging on WSL, which apparently does not support loadable modules, on which the userspace filesystem utilities seem to depend. Should be easily doable from a vanilla Linux box, but this was getting too annoying, so I decided to leave it for now.
Instrumentation to the rescue
As I already remarked, I already have the source code, so how about instrumenting it? Ain’t no better thing than a little printf
-debugging, amirite?
I implemented some rudimentary logging facilities in the game, both from the C and assembly side, then peppered the code of main()
with traces at important locations, to try to narrow down the area where it was freezing. Soon I had my suspect; the function for showing the first splash image with the MPS Labs logo was apparently not returning:
I started going deeper, into openShowPic()
and beyond, adding trace macros. Soon I found myself in assembly code, it was clear the problem was there. This presented the additional challenge of calling the variadic function from assembly, and doing so conditionally. I came up with this:
It took a while to get all this working, but I’m pretty happy with the result. However, the results still raised more questions rather than provide answers. I got as far as this assembly routine:
Here’s the mystery part. When I notice the freeze and close the emulator, I can see the logs from several iterations of the loop iterating over the image rows. The destination offsets make sense, and the destination segment has the expected value of 0xA000
, the video memory.
However, the code does not exactly freeze inside any of the routines. If I keep the emulator running longer, the log has entries for more rows (I got up to row 70 waiting for ~5mins). It seems to be progressing, just at a glacial pace? I might try leaving it on and see if it ever succeeds, but so far I haven’t found the patience so far. There seems to be something weird about how/when DosBox flushes the logfile to the host OS. I’m pretty sure I am losing some output lines when I’m closing the frozen game.
I decided to add some extra code to my C tracing routine to include a time delta value in seconds from the previous log line, to see where the slowdown is occuring. But inspecting the output, I can see that almost all the deltas are 0 seconds, except for a single one taking 1 second. How can the program both be slow and fast at the same time?
This is the worst kind of bug, a problem which changes, or appears to change its behaviour when I try to look at it. I can’t catch it in the debugger, and traces are unreliable. How can I figure out the cause of a problem which seems to defy causality?
This is going to take a while
For now, I seem to be out of (good) ideas. What I’m left with is as following:
- Try using the dosbox-staging debugger and/or CodeView under a different emulator/hypervisor. Perhaps I can glean something from a more dependable debugging environment.
- Compare the execution with the original executable, see if the variables have roughly the same values. Would be a pain though, maybe the DosBox debugger’s named variable lists (LV/SV/IV) could make it easier.
- Perhaps my time delta logging has problems. Would need to check it against a
sleep()
call, but this compiler doesn’t have one, so I need to write one myself. - Try running the “frozen” game a little longer, see if it ever gets anywhere.
One more thing that comes to mind is the layout of the data segment. Part of the point of all this was to make the executable independent of the data layout. But perhaps I missed some offset when changing numeric literals to variable offsets. Or there might be a piece of code that expects a piece of data residing right after another, and my executable doesn’t match that. So I could try making the data layout aligned with the original, at least for now. The good news with that is that since I still have all the data in assembly, most of it matches already. After dumping the data segments between the original and the recreation, then comparing the hex dumps in WinMerge, I can see they only seem to differ in the location where the libc data is placed - it comes in the middle of the data segment in the original, while my recreation has it at the end. I think I could tweak the linking order and get a perfect match, hopefully that might be enough to get it to run, then maybe I can figure out where it’s breaking by changing things a bit at a time.
There’s also the more frightening prospect - perhaps a numeric constant was supposed to be just that, and I blithely changed it into an offset to a variable? That would be hard to find at this point.
A lot of open questions, hardly any answers. But at least I’m back into it, interested and invested. I’m sure I can crack it given time. When I do, I’ll be sure to write up Part 2.