<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2026-06-21T00:40:54+00:00</updated><id>/feed.xml</id><title type="html">neuviemeporte’s journal</title><subtitle>Misadventures in 16bit reverse engineering, programming WTFs and general rants about the IT industry.</subtitle><entry><title type="html">Meatbag is useless, meatbag sad</title><link href="/f15-se2/2026/06/21/meatbag.html" rel="alternate" type="text/html" title="Meatbag is useless, meatbag sad" /><published>2026-06-21T00:00:00+00:00</published><updated>2026-06-21T00:00:00+00:00</updated><id>/f15-se2/2026/06/21/meatbag</id><content type="html" xml:base="/f15-se2/2026/06/21/meatbag.html"><![CDATA[<p><small>(<em>This post is part of a <a href="/category/f15-se2.html">series</a> on the subject of my hobby project, which is recreating the C source code for the 1989 game <a href="/f15-se2/2022/06/05/origins.html">F-15 Strike Eagle II</a> by reverse engineering the original binaries.</em>)</small></p>

<p>I realize this might be a little controversial, but I wanted to share a bit of my perspective on the usage of LLMs in a project like this. I must admit I was initially pooh-poohing the insane rate of progress achieved by @AJenbo with them, thinking sure, they can figure out how to do this from all of this research and code I did upfront, but surely I am the Real Deal, only I carry the wisdom, and I will be the final oracle to consult when inevitably the dumb machines reach the limit of their context windows, haha. I think this is a common sentiment among IT professionals from reading comments on technology forums and such. I’m still not sure how all of this will play out in the broader context of the use of LLMs in software development. For sure, there is a lot of grift, greed, rabid hype, lies and outright scams in the industry around “AI” while everybody is trying to make a buck. But I stopped thinking I was smarter than an LLM. Let me show you why with a couple of examples from the C code reconstruction of the largest of the 3 executables that make up this game, <code class="language-plaintext highlighter-rouge">egame.exe</code>.</p>

<h2 id="a-small-thing-to-break-your-brain">A small thing to break your brain</h2>

<p>We’re in the routine <code class="language-plaintext highlighter-rouge">computeHudAttitude</code>. A wee bit of assembly doesn’t want to cooperate:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><span class="nf">sub</span> <span class="nb">ax</span><span class="p">,</span> <span class="nb">ax</span>
<span class="nf">sub</span> <span class="nb">ax</span><span class="p">,</span> <span class="p">[</span><span class="mh">0x581c</span><span class="p">]</span>
<span class="nf">mov</span> <span class="p">[</span><span class="mh">0x581c</span><span class="p">],</span> <span class="nb">ax</span></code></pre></figure>

<p>How hard can it be? Clearly it’s just <code class="language-plaintext highlighter-rouge">var = -var</code>. But just hold on cowboy. The generated assembly is this instead:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><span class="nf">mov</span> <span class="nb">ax</span><span class="p">,</span> <span class="p">[</span><span class="mh">0x581c</span><span class="p">]</span>
<span class="nf">neg</span> <span class="nb">ax</span>
<span class="nf">mov</span> <span class="p">[</span><span class="mh">0x581c</span><span class="p">],</span> <span class="nb">ax</span> </code></pre></figure>

<p>Stuff like this always smells of signedness, so I tried flipping it with no success. Then the obvious <code class="language-plaintext highlighter-rouge">var = 0 - var</code>, and then every possible stupid way to write this, including insane casts and questionable stuff like <code class="language-plaintext highlighter-rouge">(var ^ var) - var</code>, <code class="language-plaintext highlighter-rouge">-(&amp;var)[0]</code> and a ternary expression (don’t ask). Changing optimization flags didn’t work. Nothing worked.</p>

<p>Then collaborator @xor2003 got this out of his LLM:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">var</span> <span class="o">=</span> <span class="mh">0x10000</span> <span class="o">-</span> <span class="n">var</span><span class="p">;</span></code></pre></figure>

<p>…the constant value is not even 16bit… but I guess it would be <code class="language-plaintext highlighter-rouge">0xffff+1</code> which overflows to zero.</p>

<p>Any questions? No? That’s fine, I don’t have any either. Just want to crawl under a rock.</p>

<h2 id="make-double-sure-this-var-is-this-var">Make double sure this var is this var</h2>

<p>This happened in a routine that’s called <code class="language-plaintext highlighter-rouge">fireAirThreat</code> today, and now looks much better than this, but this is what we were working with originally:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp">    <span class="n">i</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="n">int16</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">stru_3B202</span><span class="p">[</span><span class="n">param_1</span><span class="p">].</span><span class="n">state</span><span class="p">[</span><span class="mi">14</span><span class="p">];</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">g</span> <span class="o">&gt;&gt;</span> <span class="mi">1</span> <span class="o">&lt;</span> <span class="o">*</span><span class="p">(</span><span class="n">uint16</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">sams</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">field_8</span> <span class="o">&amp;&amp;</span>
        <span class="p">(</span><span class="kt">unsigned</span><span class="p">)(</span><span class="o">-</span><span class="p">(</span><span class="n">word_330B8</span> <span class="o">*</span> <span class="mi">3</span> <span class="o">-</span> <span class="mh">0x10</span><span class="p">))</span> <span class="o">&lt;</span> <span class="n">g</span> <span class="o">&amp;&amp;</span>
        <span class="n">g</span> <span class="o">&lt;</span> <span class="mh">0x1000</span> <span class="o">&amp;&amp;</span>
        <span class="n">i</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* launch missile into slot j */</span>
        <span class="n">stru_335C4</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">mapX</span> <span class="o">=</span> <span class="n">stru_3B202</span><span class="p">[</span><span class="n">param_1</span><span class="p">].</span><span class="n">posX</span><span class="p">;</span>
        <span class="n">stru_335C4</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">mapY</span> <span class="o">=</span> <span class="n">stru_3B202</span><span class="p">[</span><span class="n">param_1</span><span class="p">].</span><span class="n">posY</span><span class="p">;</span>
        <span class="n">stru_335C4</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">alt</span> <span class="o">=</span> <span class="n">stru_3B202</span><span class="p">[</span><span class="n">param_1</span><span class="p">].</span><span class="n">alt</span> <span class="o">-</span> <span class="mh">0x19</span><span class="p">;</span>
        <span class="n">stru_335C4</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">field_6</span> <span class="o">=</span> <span class="n">sams</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">field_A</span> <span class="o">&gt;&gt;</span> <span class="mi">6</span><span class="p">;</span></code></pre></figure>

<p>Started simple enough with code out of the LLM mostly matching, but a different register was being used around an access to a struct member:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0000:767a/00767a: mov ax, 0x24                     == 0000:4b74/004b74: mov ax, 0x24 ; sizeof(stru_3B202)
0000:767d/00767d: imul [bp+0x04]                   == 0000:4b77/004b77: imul [bp+0x04] ; multiply by index (param_1)
0000:7680/007680: mov si, ax                       == 0000:4b7a/004b7a: mov si, ax ; save for later
0000:7682/007682: mov ax, [si-0x7690]              =~ 0000:4b7c/004b7c: mov ax, [si-0x737e] ; load value of specific member
0000:7686/007686: mov [bp-0x14], ax                == 0000:4b80/004b80: mov [bp-0x14], ax ; store value in stack variable `i`
0000:7689/007689: mov ax, 0x12                     == 0000:4b83/004b83: mov ax, 0x12 ; load sizeof(sams)
0000:768c/00768c: imul [bp-0x14]                   == 0000:4b86/004b86: imul [bp-0x14] ; mutliply by `i`
0000:768f/00768f: mov bx, ax                       != 0000:4b89/004b89: mov di, ax ; save for later... whoops!
</code></pre></div></div>

<p>The part leading up to the mismatch deals with the line <code class="language-plaintext highlighter-rouge">i = *(int16 *)&amp;stru_3B202[param_1].state[14]</code>, and shows a familar pattern I’ve seen with MSC many times before when accessing a member in an array of structs: it loads the constant size of the struct (<code class="language-plaintext highlighter-rouge">0x24</code>) into a register, multiplies it by the index variable (<code class="language-plaintext highlighter-rouge">bp+0x4</code>), then saves the value in register <code class="language-plaintext highlighter-rouge">si</code> because <code class="language-plaintext highlighter-rouge">sizeof(stru_3B202) * param_1</code> will be reused in subsequent accesses to this array element. Next we want to do the same for the comparison inside the <code class="language-plaintext highlighter-rouge">if</code>, but for <code class="language-plaintext highlighter-rouge">sams[i]</code>. Except that when the compiler tries to likewise save <code class="language-plaintext highlighter-rouge">sizeof(sams) * i</code> into <code class="language-plaintext highlighter-rouge">di</code> for reuse, this does not match the reference, which loads the value into <code class="language-plaintext highlighter-rouge">bx</code> instead, seemingly oblivious to the possibility of reuse. Could this be unoptimized code? I tried moving the routine to a file that builds without optimizations, but curiously enough, it didn’t make any difference, and the function looked basically identically, which is also pretty surprising.</p>

<p>I kept banging my head against the wall and tried eveything I could think of, but I could not come up with a way to make the compiler more stupid. Instead, it was making a fool out of me. Again, it was @xor2003 whose LLM came up with this:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp">    <span class="k">if</span> <span class="p">(</span><span class="n">sams</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">field_8</span> <span class="o">&gt;</span> <span class="p">(</span><span class="n">g</span> <span class="o">&gt;&gt;</span> <span class="mi">1</span><span class="p">))</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">((</span><span class="kt">unsigned</span><span class="p">)(</span><span class="o">-</span><span class="p">(</span><span class="n">word_330B8</span> <span class="o">*</span> <span class="mi">3</span> <span class="o">-</span> <span class="mh">0x10</span><span class="p">))</span> <span class="o">&lt;</span> <span class="n">g</span><span class="p">)</span> <span class="p">{</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">g</span> <span class="o">&lt;</span> <span class="mh">0x1000</span><span class="p">)</span> <span class="p">{</span>
                <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* launch missile into slot j */</span>
        <span class="n">i</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
        <span class="n">stru_335C4</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">mapX</span> <span class="o">=</span> <span class="n">stru_3B202</span><span class="p">[</span><span class="n">param_1</span><span class="p">].</span><span class="n">posX</span><span class="p">;</span>
        <span class="n">stru_335C4</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">mapY</span> <span class="o">=</span> <span class="n">stru_3B202</span><span class="p">[</span><span class="n">param_1</span><span class="p">].</span><span class="n">posY</span><span class="p">;</span>
        <span class="n">j</span> <span class="o">=</span> <span class="n">j</span><span class="p">;</span>
        <span class="n">param_1</span> <span class="o">=</span> <span class="n">param_1</span><span class="p">;</span>
        <span class="n">stru_335C4</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">alt</span> <span class="o">=</span> <span class="n">stru_3B202</span><span class="p">[</span><span class="n">param_1</span><span class="p">].</span><span class="n">alt</span> <span class="o">-</span> <span class="mh">0x19</span><span class="p">;</span>
        <span class="n">stru_335C4</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">field_6</span> <span class="o">=</span> <span class="n">sams</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">field_A</span> <span class="o">&gt;&gt;</span> <span class="mi">6</span><span class="p">;</span></code></pre></figure>

<p>I’m writing this up a while after the fact, so I don’t recall now if breaking up the <code class="language-plaintext highlighter-rouge">&amp;&amp;</code>s into nested conditions was absolutely necessary, but I’ve seen elsewhere that opening up a new condition made the compiler lose its memory of the values involved in the conditionals and forced it to recalculate values that it already had in registers, so it’s possible. But the definite missing piece  were the insane assignments of variables to themselves. I feel like I wouldn’t have come up with this in a million years, but the LLM somehow pulled it out of a hat.</p>

<h2 id="now-register-now-you-dont">Now register, now you don’t</h2>

<p>This comes from <code class="language-plaintext highlighter-rouge">drawProjectionSphere</code>. We have a loop filling an array from other arrays:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp">    <span class="n">a</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">do</span> <span class="p">{</span>
        <span class="n">f</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">b</span><span class="p">[</span><span class="n">a</span><span class="p">];</span>
        <span class="n">f</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">c</span><span class="p">[</span><span class="n">a</span><span class="p">];</span>
        <span class="n">f</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="n">a</span><span class="p">];</span>
        <span class="n">f</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">e</span><span class="p">[</span><span class="n">a</span><span class="p">];</span>
        <span class="n">f</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="n">a</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
        <span class="n">f</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="n">e</span><span class="p">[</span><span class="n">a</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
        <span class="n">f</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="n">b</span><span class="p">[</span><span class="n">a</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
        <span class="n">f</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span> <span class="o">=</span> <span class="n">c</span><span class="p">[</span><span class="n">a</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
        <span class="n">drawPolygonOutline</span><span class="p">(</span><span class="n">word_3298A</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">f</span><span class="p">,</span> <span class="n">a</span> <span class="o">+</span> <span class="mh">0x60</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="o">++</span><span class="n">a</span> <span class="o">&lt;</span> <span class="mi">16</span><span class="p">);</span></code></pre></figure>

<p>I needed to get these instructions:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><span class="nf">mov</span> <span class="kt">word</span> <span class="p">[</span><span class="nb">bp</span><span class="o">-</span><span class="mh">0x04</span><span class="p">],</span> <span class="mh">0x0</span> <span class="c1">; a = 0</span>
<span class="nf">mov</span> <span class="nb">si</span><span class="p">,</span> <span class="p">[</span><span class="nb">bp</span><span class="o">-</span><span class="mh">0x04</span><span class="p">]</span> <span class="c1">; load `a`</span>
<span class="nf">shl</span> <span class="nb">si</span><span class="p">,</span> <span class="mi">1</span> <span class="c1">; multiply by 2 to get offset in array of int16s</span>
<span class="nf">add</span> <span class="nb">si</span><span class="p">,</span> <span class="nb">bp</span> <span class="c1">; rebase the offset to the start of the stack frame</span>
<span class="nf">mov</span> <span class="nb">ax</span><span class="p">,</span> <span class="p">[</span><span class="nb">si</span><span class="o">-</span><span class="mh">0x26</span><span class="p">]</span> <span class="c1">; load the value from the specific stack variable</span>
<span class="nf">mov</span> <span class="p">[</span><span class="nb">bp</span><span class="o">-</span><span class="mh">0x009c</span><span class="p">],</span> <span class="nb">ax</span> <span class="c1">; f[0] = b[a]</span></code></pre></figure>

<p>But again I kept getting a different register, <code class="language-plaintext highlighter-rouge">ax</code>, with the extra complication of this value getting saved onto the stack for no apparent reason:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><span class="nf">mov</span> <span class="kt">word</span> <span class="p">[</span><span class="nb">bp</span><span class="o">-</span><span class="mh">0x04</span><span class="p">],</span> <span class="mh">0x0</span>
<span class="nf">mov</span> <span class="nb">ax</span><span class="p">,</span> <span class="p">[</span><span class="nb">bp</span><span class="o">-</span><span class="mh">0x04</span><span class="p">]</span> <span class="c1">; `a` loaded into `ax` instead</span>
<span class="nf">shl</span> <span class="nb">ax</span><span class="p">,</span> <span class="mi">1</span>
<span class="nf">add</span> <span class="nb">ax</span><span class="p">,</span> <span class="nb">bp</span>
<span class="nf">mov</span> <span class="p">[</span><span class="nb">bp</span><span class="o">-</span><span class="mh">0x00a6</span><span class="p">],</span> <span class="nb">ax</span> <span class="c1">; register spill of ax onto stack? never actually reused</span>
<span class="nf">mov</span> <span class="nb">bx</span><span class="p">,</span> <span class="nb">ax</span> <span class="c1">; now it goes into `bx`? okay...</span>
<span class="nf">mov</span> <span class="nb">ax</span><span class="p">,</span> <span class="p">[</span><span class="nb">bx</span><span class="o">-</span><span class="mh">0x26</span><span class="p">]</span> <span class="c1">; stack variable addressed through bx</span>
<span class="nf">mov</span> <span class="p">[</span><span class="nb">bp</span><span class="o">-</span><span class="mh">0x009c</span><span class="p">],</span> <span class="nb">ax</span></code></pre></figure>

<p>The fact that the stack is addressed through <code class="language-plaintext highlighter-rouge">bx</code> without a segment prefix might be surprising because the default segment register for that is <code class="language-plaintext highlighter-rouge">ds</code>, but remember this is the small memory model, so <code class="language-plaintext highlighter-rouge">ds=ss</code>, and all is well. In any case, this was happening in optimized code, in a relatively complex routine with repeated almost identical blocks of code, and I’ve seen code deduplication performed by this compiler in the past, so I was having a bad feeling looking at this. I’m not going to paste the entire routine in here, but the crux of the issue was that it smelled of some variables being declared as <code class="language-plaintext highlighter-rouge">register</code> (because they went into <code class="language-plaintext highlighter-rouge">si</code> and <code class="language-plaintext highlighter-rouge">di</code>), except they did not behave this way everywhere, oh no. Only in some places, and elsewhere it was as if the <code class="language-plaintext highlighter-rouge">register</code> declaration disappeared. How could a variable have seemingly two different definitions inside one routine? This time it took @AJenbo armed with an LLM to figure it out:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="kt">void</span> <span class="nf">drawProjectionSphere</span><span class="p">(</span><span class="kt">int</span> <span class="n">arg_0</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// register int i, j ❌❌❌ removed</span>

    <span class="c1">// [...]</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">word_38FDC</span> <span class="o">&lt;</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">sub_1FEEC</span><span class="p">(</span><span class="n">arg_0</span><span class="p">);</span>
        <span class="k">return</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="p">{</span>
        <span class="k">register</span> <span class="kt">int</span> <span class="n">i</span><span class="p">;</span> <span class="c1">// ✅ placed in nested scope instead</span>
        <span class="n">a</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="k">do</span> <span class="p">{</span>
            <span class="n">i</span> <span class="o">=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">a</span><span class="p">;</span>
            <span class="o">*</span><span class="p">((</span><span class="kt">int</span> <span class="o">*</span><span class="p">)((</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">word_3BE9C</span> <span class="o">+</span> <span class="n">i</span><span class="p">))</span> <span class="o">=</span> <span class="o">*</span><span class="p">((</span><span class="kt">int</span> <span class="o">*</span><span class="p">)((</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">word_32990</span> <span class="o">+</span> <span class="n">i</span><span class="p">));</span>
            <span class="n">a</span><span class="o">++</span><span class="p">;</span>
        <span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">a</span> <span class="o">&lt;</span> <span class="mi">16</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="n">word_38FC6</span> <span class="o">=</span> <span class="o">-</span><span class="n">var_226</span><span class="p">;</span></code></pre></figure>

<p>Turns out the key to solving it was a bunch of lines above the part I was trying to force to match. By removing the <code class="language-plaintext highlighter-rouge">register</code> variables from the function scope, and introducing nested scopes with these variables, it made the <code class="language-plaintext highlighter-rouge">si</code> and <code class="language-plaintext highlighter-rouge">di</code> registers available outside those scopes, which freed the problematic section to use <code class="language-plaintext highlighter-rouge">si</code>. Kind of obvious in hindsight, but I don’t know how long I would have had to brain at this to come up with nested scopes.</p>

<h2 id="and-now-for-something-completely-different">And now for something completely different</h2>

<p>The routines above were ones we originally though were the last missing ones, and solving them felt doubly important because for one we cleared huge obstacles, and two, we thought we were wrapping up the reconstruction of <code class="language-plaintext highlighter-rouge">egame</code>. But a little bit later it turned out some C code was still hiding in the executable, so it was back to the drawing board and celebrate completion again. But bottom line, we had our precious source code written out in full.</p>

<p>Now, one of the interesting questions was, if we rebuild all of it with maximum optimizations, will it run? Surely it will run faster?</p>

<p>Well, no. And no. It does not run at all. @AJenbo figured out it was due to some optimizations in a routine having to deal with timer-related variables that changed their values outside of the normal flow of C code. You know, the kind of stuff that you solve with <code class="language-plaintext highlighter-rouge">volatile</code>. Except that MS C 5.1 does not support volatile. Or rather, if memory serves, the docs say something to the effect of it is supported “syntactically, but not functionally”. Which I guess is a smart way to say it doesn’t throw an error, but doesn’t really do anything either.</p>

<p>We’re guessing that this could be the reason that some parts of the game were built with the debug mode <code class="language-plaintext highlighter-rouge">/Zi</code> flag - Microprose was building the game in debug mode during development and it worked, but when they tried to build it in turbo mode for release, it broke and they had to ship, so they went back to <code class="language-plaintext highlighter-rouge">/Zi</code> even though it meant slower code. But the release deadline was probably impending, so they left it at that. Guess it’s a lucky thing for us because all that debug code was so much easier to untangle than optimized code.</p>

<p>And with this final example of human fallibility, I leave you tonight, dear reader. Sleep well, but know that the LLMs don’t sleep. 😈</p>]]></content><author><name></name></author><category term="f15-se2" /><summary type="html"><![CDATA[(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)]]></summary></entry><entry><title type="html">The Air Force needs YOU!</title><link href="/f15-se2/2026/06/20/needyou.html" rel="alternate" type="text/html" title="The Air Force needs YOU!" /><published>2026-06-20T00:00:00+00:00</published><updated>2026-06-20T00:00:00+00:00</updated><id>/f15-se2/2026/06/20/needyou</id><content type="html" xml:base="/f15-se2/2026/06/20/needyou.html"><![CDATA[<p><small>(<em>This post is part of a <a href="/category/f15-se2.html">series</a> on the subject of my hobby project, which is recreating the C source code for the 1989 game <a href="/f15-se2/2022/06/05/origins.html">F-15 Strike Eagle II</a> by reverse engineering the original binaries.</em>)</small></p>

<p>I must admit the rate of progress currently experienced in the project is overwhelming. A little over a month ago it seemed that we had several more years of laborious rewritting of assembly into C before the second game executable (<code class="language-plaintext highlighter-rouge">egame</code>) started looking like something, and the third one (<code class="language-plaintext highlighter-rouge">end</code>) still to go for dessert. Meanwhile, as of the time of writing this, all C code has been reconstructed for all executables, all data has been moved from assembly into C, most of the assembly-only code has functional replacements written in C, most routines and data structures have been assigned meaningful names, and we’re looking at forking off the repo for a porting project in the near future.</p>

<p>However, this explosive growth in completeness and capability also means that we’re abandoning the relatively peaceful domain of just looking at whether the reconstructed opcodes match, and we actually need to maintain a running game going forward. The tooling makes sure that the opcodes stay faithful to the original as we continue to make changes, but it cannot catch all bugs, particularly not the ones that have to do with data layout.</p>

<h2 id="test-pilots-wanted">Test pilots wanted</h2>

<p>Seeing how community involvement has allowed the project to flourish, I was hoping we could ask for a little bit more help. The F-15 Strike Eagle 2 reconstruction is now open and ready for test pilots to take to the digital skies and find any bugs that we might have missed. Right now, the <a href="https://github.com/neuviemeporte/f15se2-re/releases/tag/v0.9.1">latest release</a> is <code class="language-plaintext highlighter-rouge">v0.9.1</code> and it should work with the original game’s <code class="language-plaintext highlighter-rouge">451.03</code> version with the desert storm expansion pack - just drop the executables into the game folder replacing the original ones (make a backup beforehand), possibly removing the original <code class="language-plaintext highlighter-rouge">f15.com</code> to make sure it does not get launched in place of the new <code class="language-plaintext highlighter-rouge">f15.exe</code>, and take off. It will not go into the setup screen, instead assuming a MCGA/VGA display with no sound and no joystick. But everything else should work in all 3 parts of the game (mission briefling, flight and debriefing).</p>

<p>If anything <em>does not</em> work, we would appreciate <a href="https://github.com/neuviemeporte/f15se2-re/issues">bug reports</a>. We are looking for crashes, graphical glitches, keys not working etc. Consider attaching a screnshot (<code class="language-plaintext highlighter-rouge">Ctrl+F5</code> in dosbox) if it’s useful. A description of what was being done before the issue occured will be helpful to us in reproducing the problem and hopefully developing a fix.</p>

<p>It’s important to notice that this is a bug-for-bug reconstruction, so any behaviour also present in the original game needs to stay as is (for now). The original has some problems with 3d objects disappearing, plane falling towards the sky when inverted and out of fuel etc. So before reporting an issue, it would be best to make sure it does not occur in the original, so keeping a copy around for reference might be a good idea.</p>

<p>Thank you to everybody who decides to help and thanks to everyone who contributed to the project thus far, allowing it to reach this milestone. I’m looking forward to the next ones, and I’m happy y’all are along for the ride.</p>]]></content><author><name></name></author><category term="f15-se2" /><summary type="html"><![CDATA[(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)]]></summary></entry><entry><title type="html">Alive and kicking</title><link href="/f15-se2/2026/06/06/alive.html" rel="alternate" type="text/html" title="Alive and kicking" /><published>2026-06-06T00:00:00+00:00</published><updated>2026-06-06T00:00:00+00:00</updated><id>/f15-se2/2026/06/06/alive</id><content type="html" xml:base="/f15-se2/2026/06/06/alive.html"><![CDATA[<p><small>(<em>This post is part of a <a href="/category/f15-se2.html">series</a> on the subject of my hobby project, which is recreating the C source code for the 1989 game <a href="/f15-se2/2022/06/05/origins.html">F-15 Strike Eagle II</a> by reverse engineering the original binaries.</em>)</small></p>

<blockquote>
  <p>I am happy to say the reconstruction for EGAME.EXE is progressing smoothly.(…) –me, more than a year ago</p>
</blockquote>

<p>Famous last words, right? After typing that, I had made some more progress on the single biggest routine of the <code class="language-plaintext highlighter-rouge">egame</code> executable, which is still called <code class="language-plaintext highlighter-rouge">otherKeyDispatch()</code> (for lack of a better name, because it seems to switch on a keycode value), and then all work on the project effectively ceased. I worked on completing the routine seldomly, and just couldn’t get through it. It was not about some new difficulty, I had just become completely burned out by the reconstruction process with only about 30 lines of C to go, and unable to continue.</p>

<p>It didn’t help that I also made a slight detour while (not) working on the reconstruction. I had mused about applying a “neural network” to the project as far back as 2022, but back then it seemed like something out of science fiction. However we’re now living through an LLM revolution (leaving the behaviour of <a href="https://isaiprofitable.com/">financial markets</a> and social and environmental costs out of it for now), and having had a taste myself at work, I decided it would be worth trying to delegate some of the most boring and repetetive work. Having incomplete or totally hallucinated answers would be fine because I have my tooling and could verify the results easily enough. Really any amount of assistance would have been invaluable to me. And the task really is perfect for applying an LLM - it’s completely mechanistic, non-creative, based on correlative knowledge that’s implicit from the relationship between the binaries, the old DOS C compiler, and the reconstructed source code, and never explicitly stated.</p>

<p>But I wasn’t happy with just using Copilot from VS Code. This project was always a little about keeping my interest in technology fresh, so I decided to learn a little bit more and create my own, self-hosted setup for this purpose, using one of the free open weights models. I had recently bought a decent GPU so I used <a href="https://ollama.com/">Ollama</a> to run the models, and the <a href="https://www.continue.dev/">Continue</a> extension to bridge the model to VS Code. It’s a long story that I want to address separately at some point, but the bottom line was that I did not obtain very useful results this way. The biggest models I could realistically run on my hardware fell a little bit short, and I concluded that my setup was lacking enough high quality context for the model to operate off of. So I spent additional time <a href="https://github.com/neuviemeporte/mzretools/blob/master/tools/rag_index.py">developing</a> a custom <a href="https://en.wikipedia.org/wiki/Retrieval-augmented_generation">RAG</a> <a href="https://github.com/neuviemeporte/mzretools/blob/master/tools/rag_mcp.py">solution</a> to create and employ a database of assembly-to-C source snippet database that the model could use as a reference without having it ingest my entire codebase (which wouldn’t fit anyway).</p>

<p>All of this took a lot of time without having anything to show for my efforts, and depression and guilt slowly creeping in. This is when I got an intriguing message from a potential collaborator.</p>

<h1 id="christmas-comes-early">Christmas comes early</h1>

<p>I was being graced by the presence of an experienced <a href="https://github.com/AJenbo">reverse engineer</a> and veteran of <a href="https://github.com/isledecomp/isle">mutiple</a> <a href="https://github.com/diasurgical/DevilutionX">successful</a> decompilation projects, who quickly started producing an almost overwheling stream of commits and PRs into my repository. Using LLMs, he astonishingly was able to basically start and finish reconstructing the last executable, <code class="language-plaintext highlighter-rouge">end.exe</code> within a couple days, with me just having to fix a couple simple problems that Sillicon Steve was unsuccessful at resolving. I quickly decided to abandon my old workflow which was dependent on IDA, because synchronizing access to binary <code class="language-plaintext highlighter-rouge">.idb</code> databases in git was not going to be feasible, and would only hamper progress on the reconstruction, which was then happening at lightning speed. We now have:</p>

<ul>
  <li>LLM-generated routine/variable names in <code class="language-plaintext highlighter-rouge">start.exe</code> which contained mostly stub names despite being fully reconstructed by me. It’s fine if not all of these are spot on, names are easier to deal with than random numbers and can be changed later as more information is added.</li>
  <li>a fully reconstructed, debugged and working <code class="language-plaintext highlighter-rouge">end.exe</code>, also with some autogenerated symbol names.</li>
  <li>a somewhat functional but still unstable main game executable (<code class="language-plaintext highlighter-rouge">egame.exe</code>), with most (90%+) C code reconstructed, and the rest in progress.</li>
</ul>

<p>So, we seem to have had a minor resurrection miracle in the project, and I wish to extend my sincerest thanks to @AJenbo for showing up and saving the day when he did.</p>

<p>It’s worth to mention here that all of this would not have been possible without the work @AJenbo did on <a href="https://github.com/AJenbo/ghidra/tree/16bit">Ghidra</a> to make it better support 16bit code. The Ghidra decompiler is pretty great, and I’ve <a href="/f15-se2/2024/05/05/ghidra.html">used it before</a>, but it was a grind because it does not really support segmented addressing and some quirks of the 16bit architecture. With that support added in, the output of Ghidra is used as a starting point for LLM agents which iterate over the reconstruction, invoking <code class="language-plaintext highlighter-rouge">mzretools</code> to check their work. But those sweet 16bit changes are not likely to be accepted into upstream Ghidra, so as far as I understand, @AJenbo’s repository remains the single location where this tooling can be obtained.</p>

<p>So, for now at least, the project has had new life pumped into it and is back on track, thanks to collaborators (there are multiple now), whom I wish to personally thank for their effort and dedication.</p>

<h1 id="back-into-it-now">Back into it now</h1>

<p>This blog was always about documenting some of the more esoteric quirks of the MS C compiler, and I have a fresh supply of those from the latest trove of going through the reconstruction that I wish to present. But I must say that working on code that has been pre-chewed by Ghidra and LLMs is much more pleasant than having to write everything out manually. I can focus on the meaningful differences where the machine was unable to make progress, and where out-of-the-box thinking might still be useful.</p>

<h2 id="some-registers-are-more-unsigned-than-others">Some registers are more unsigned than others</h2>

<p>We’re looking at this C source line:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">gfx_copyRect</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">b</span> <span class="o">-</span> <span class="mi">3</span><span class="p">,</span> <span class="n">c</span> <span class="o">-</span> <span class="mi">3</span><span class="p">,</span> <span class="n">byte_3C5A0</span><span class="p">,</span> <span class="n">b</span> <span class="o">-</span> <span class="mi">3</span><span class="p">,</span> <span class="n">c</span> <span class="o">-</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">6</span><span class="p">);</span></code></pre></figure>

<p>Unfortunately this does not match when comparing, the original loads the value of <code class="language-plaintext highlighter-rouge">c</code> and <code class="language-plaintext highlighter-rouge">b</code> into <code class="language-plaintext highlighter-rouge">si</code> to perform the subtraction, while our reconstruction uses <code class="language-plaintext highlighter-rouge">ax</code> instead:</p>

<pre>
0000:0b54/000b54: mov si, [bp-0x06]                != 0000:0715/000715: mov ax, [bp-0x06] ; 🤨
<r>ERROR: Instruction mismatch in routine updateFrame at 0000:0b54/000b54: mov si, [bp-0x06] != 0000:0715/000715: mov ax, [bp-0x06]</r>
--- Context information for up to 20 additional instructions of routine updateFrame after mismatch location:
0000:0b57/000b57: sub si, 0x3                      != 0000:0718/000718: sub ax, 0x3 
0000:0b5a/000b5a: mov di, [bp-0x08]                != 0000:071b/00071b: mov si, ax
0000:0b5d/000b5d: sub di, 0x3                      != 0000:071d/00071d: mov ax, [bp-0x08]
0000:0b60/000b60: mov ax, 0x6                      != 0000:0720/000720: sub ax, 0x3
[...]
</pre>

<p>I couldn’t tell you why, but the signedness of the argument in the function declaration makes the difference:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// wrong</span>
<span class="kt">int</span> <span class="n">FAR</span> <span class="n">CDECL</span> <span class="nf">gfx_copyRect</span><span class="p">(</span><span class="kt">int</span> <span class="n">srcPage</span><span class="p">,</span> <span class="kt">int</span> <span class="n">srcX</span><span class="p">,</span> <span class="kt">int</span> <span class="n">srcY</span><span class="p">,</span> <span class="kt">int</span> <span class="n">dstPage</span><span class="p">,</span> <span class="kt">int</span> <span class="n">dstX</span><span class="p">,</span> <span class="kt">int</span> <span class="n">dstY</span><span class="p">,</span> <span class="kt">int</span> <span class="n">width</span><span class="p">,</span> <span class="kt">int</span> <span class="n">height</span><span class="p">);</span>
<span class="c1">// right</span>
<span class="kt">int</span> <span class="n">FAR</span> <span class="n">CDECL</span> <span class="nf">gfx_copyRect</span><span class="p">(</span><span class="kt">int</span> <span class="n">srcPage</span><span class="p">,</span> <span class="n">uint16</span> <span class="n">srcX</span><span class="p">,</span> <span class="n">uint16</span> <span class="n">srcY</span><span class="p">,</span> <span class="kt">int</span> <span class="n">dstPage</span><span class="p">,</span> <span class="n">uint16</span> <span class="n">dstX</span><span class="p">,</span> <span class="n">uint16</span> <span class="n">dstY</span><span class="p">,</span> <span class="kt">int</span> <span class="n">width</span><span class="p">,</span> <span class="kt">int</span> <span class="n">height</span><span class="p">);</span></code></pre></figure>

<p>I don’t know why, but my intuition with seeing nonsense like this is usually to try fiddling with the signedness. Why is <code class="language-plaintext highlighter-rouge">si</code> better for unsigned? No idea, but that’s what MS C does.</p>

<h2 id="this-pointer-is-huge">This pointer is HUGE</h2>

<p>Ghidra sure made a mess of this one and the LLM couldn’t figure it out:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="n">far</span> <span class="o">*</span><span class="p">)((</span><span class="kt">char</span> <span class="n">far</span> <span class="o">*</span><span class="p">)</span><span class="n">commData</span> <span class="o">-</span> <span class="mi">4</span><span class="p">)</span> <span class="o">!=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="mh">0xca01</span> <span class="o">||</span>
    <span class="o">*</span><span class="p">(</span><span class="kt">int</span> <span class="n">far</span> <span class="o">*</span><span class="p">)((</span><span class="kt">char</span> <span class="n">far</span> <span class="o">*</span><span class="p">)</span><span class="n">commData</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span> <span class="o">!=</span> <span class="mh">0x3b9a</span><span class="p">)</span> <span class="p">{</span></code></pre></figure>

<p>This is a condition which checks whether the <a href="https://stanislavs.org/helppc/memory_control_block.html">MCB</a> preceeding the <code class="language-plaintext highlighter-rouge">COMM</code> structure (which is used to communicate data between different parts of the game) controls a magic checksum. If not, it quits the simulation immediately. I traced the register values while the equivalent assembly code is executing.</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><span class="nl">seg000:</span><span class="err">0</span><span class="nf">CEA</span>		    <span class="nv">mov</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mh">0FFFCh</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">CED</span>		    <span class="nv">cwd</span>                               <span class="c1">; dx:ax = ffff fffc</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">CEE</span>		    <span class="nv">add</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="kt">word</span> <span class="nv">ptr</span> <span class="nv">commData</span>     <span class="c1">; [commData] = 0</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">CF2</span>		    <span class="nv">adc</span>	    <span class="nb">dx</span><span class="p">,</span>	<span class="mi">0</span>                     <span class="c1">; no change</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">CF5</span>		    <span class="nv">mov</span>	    <span class="nb">cx</span><span class="p">,</span>	<span class="mh">0Ch</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">CF8</span>		    <span class="nv">shl</span>	    <span class="nb">dx</span><span class="p">,</span>	<span class="nb">cl</span>                    <span class="c1">; dx = f000</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">CFA</span>		    <span class="nv">add</span>	    <span class="nb">dx</span><span class="p">,</span>	<span class="kt">word</span> <span class="nv">ptr</span> <span class="nv">commData</span><span class="o">+</span><span class="mi">2</span>   <span class="c1">; [commData+2] = 1554, dx = 554</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">CFE</span>		    <span class="nv">mov</span>	    <span class="nb">es</span><span class="p">,</span>	<span class="nb">dx</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">D00</span>		    <span class="nv">mov</span>	    <span class="nb">bx</span><span class="p">,</span>	<span class="nb">ax</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">D02</span>		    <span class="nv">cmp</span>	    <span class="kt">word</span> <span class="nv">ptr</span> <span class="nb">es</span><span class="p">:[</span><span class="nb">bx</span><span class="p">],</span> <span class="mh">0CA01h</span>  <span class="c1">; es:bx = 554:fffc</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">D07</span>		    <span class="nv">jnz</span>	    <span class="nv">short</span> <span class="nv">loc_10D11</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">D09</span>		    <span class="nv">cmp</span>	    <span class="kt">word</span> <span class="nv">ptr</span> <span class="nb">es</span><span class="p">:[</span><span class="nb">bx</span><span class="o">+</span><span class="mi">2</span><span class="p">],</span>	<span class="mh">3B9Ah</span> <span class="c1">; magic checksum: 0x3b9aca01</span>
<span class="nl">seg000:</span><span class="err">0</span><span class="nf">D0F</span>		    <span class="nv">jz</span>	    <span class="nv">short</span> <span class="nv">loc_10D20</span></code></pre></figure>

<p>It uses the immediate value of <code class="language-plaintext highlighter-rouge">0xfffc</code> which is equivalent to -4 to step backwards from the far address of the allocated structure (<code class="language-plaintext highlighter-rouge">1554:0</code> in this case) to land at the expected location of the checksum, then dereferences the obtained pointer (<code class="language-plaintext highlighter-rouge">554:fffc</code>) for the check. Unfortunately, the code generated by the compiler from the Ghidra decompilation doesn’t match:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="mo">0000</span><span class="o">:</span><span class="mi">08</span><span class="n">ab</span><span class="o">/</span><span class="mo">000</span><span class="mi">8</span><span class="n">ab</span><span class="o">:</span> <span class="n">mov</span> <span class="n">ax</span><span class="p">,</span> <span class="p">[</span><span class="mh">0xa104</span><span class="p">]</span> <span class="p">;</span> <span class="n">commData</span>
<span class="mo">0000</span><span class="o">:</span><span class="mi">08</span><span class="n">ae</span><span class="o">/</span><span class="mo">000</span><span class="mi">8</span><span class="n">ae</span><span class="o">:</span> <span class="n">mov</span> <span class="n">dx</span><span class="p">,</span> <span class="p">[</span><span class="mh">0xa106</span><span class="p">]</span>
<span class="mo">0000</span><span class="o">:</span><span class="mi">08</span><span class="n">b2</span><span class="o">/</span><span class="mo">000</span><span class="mi">8</span><span class="n">b2</span><span class="o">:</span> <span class="n">sub</span> <span class="n">ax</span><span class="p">,</span> <span class="mh">0x4</span>
<span class="mo">0000</span><span class="o">:</span><span class="mi">08</span><span class="n">b5</span><span class="o">/</span><span class="mo">000</span><span class="mi">8</span><span class="n">b5</span><span class="o">:</span> <span class="n">sbb</span> <span class="n">dx</span><span class="p">,</span> <span class="mh">0x0</span>
<span class="mo">0000</span><span class="o">:</span><span class="mi">08</span><span class="n">b8</span><span class="o">/</span><span class="mo">000</span><span class="mi">8</span><span class="n">b8</span><span class="o">:</span> <span class="n">mov</span> <span class="n">es</span><span class="p">,</span> <span class="n">dx</span>
<span class="mo">0000</span><span class="o">:</span><span class="mi">08</span><span class="n">ba</span><span class="o">/</span><span class="mo">000</span><span class="mi">8</span><span class="n">ba</span><span class="o">:</span> <span class="n">mov</span> <span class="n">bx</span><span class="p">,</span> <span class="n">ax</span>
<span class="mo">0000</span><span class="o">:</span><span class="mi">08</span><span class="n">bc</span><span class="o">/</span><span class="mo">000</span><span class="mi">8</span><span class="n">bc</span><span class="o">:</span> <span class="n">cmp</span> <span class="n">word</span> <span class="n">es</span><span class="o">:</span><span class="p">[</span><span class="n">bx</span><span class="p">],</span> <span class="mh">0x3b9a</span>
<span class="mo">0000</span><span class="o">:</span><span class="mi">08</span><span class="n">c1</span><span class="o">/</span><span class="mo">000</span><span class="mi">8</span><span class="n">c1</span><span class="o">:</span> <span class="n">jnz</span> <span class="mh">0x8cb</span> <span class="p">(</span><span class="mh">0xa</span> <span class="n">down</span><span class="p">)</span>
<span class="mo">0000</span><span class="o">:</span><span class="mi">08</span><span class="n">c3</span><span class="o">/</span><span class="mo">000</span><span class="mi">8</span><span class="n">c3</span><span class="o">:</span> <span class="n">cmp</span> <span class="n">word</span> <span class="n">es</span><span class="o">:</span><span class="p">[</span><span class="n">bx</span><span class="o">+</span><span class="mh">0x02</span><span class="p">],</span> <span class="mh">0xca01</span>
<span class="mo">0000</span><span class="o">:</span><span class="mi">08</span><span class="n">c9</span><span class="o">/</span><span class="mo">000</span><span class="mi">8</span><span class="n">c9</span><span class="o">:</span> <span class="n">jz</span> <span class="mh">0x8da</span> <span class="p">(</span><span class="mh">0x11</span> <span class="n">down</span><span class="p">)</span></code></pre></figure>

<p>This had me scratching my head for a while before realizing that far pointers perform arithmetic on the offset part only, and this <code class="language-plaintext highlighter-rouge">add/adc/shl</code> sequence seems to be performing addition with carry on a full 32bit far pointer value, which is exactly what “huge” pointers are for under DOS. The other part of the insight is that the two <code class="language-plaintext highlighter-rouge">cmp</code> instructions seem to operate on the same logical value, so it’s not likely to be two separate comparisons in the actual C code. Sure enough, it was just a matter of casting the pointer to the structure into a huge pointer to a single <code class="language-plaintext highlighter-rouge">int32</code>, going back one (4 bytes) and dereferencing:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="p">((</span><span class="n">int32</span> <span class="n">huge</span> <span class="o">*</span><span class="p">)</span><span class="n">commData</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">!=</span> <span class="mh">0x3b9aca01</span><span class="p">)</span> <span class="p">{</span></code></pre></figure>

<p>Interestingly, now it becomes evident that the magic value of <code class="language-plaintext highlighter-rouge">0x3b9aca01</code> is actually one billion and one. I was curious about that, but it didn’t make sense why this particular value was picked when looking at it in halves, or as ASCII codes.</p>

<h2 id="compiler-entropy">Compiler entropy</h2>

<p>Until @AJenbo pointed this one out to me, I must admit I was a little bit naive, assuming that MS C was so simple and well behaved that it exhibited no non-deterministic behaviour and that I had Special Powers in predicting (or post-rationalizing) what it would do exactly in a given situation. Oh you sweet summer child.</p>

<p>It begins simple enough with reconstructing a short C routine:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="kt">void</span> <span class="nf">sub_160D3</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">arg_0</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">while</span> <span class="p">(</span><span class="o">*</span><span class="n">arg_0</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">gfx_jump_21</span><span class="p">(((</span><span class="n">uint8</span> <span class="o">*</span><span class="p">)</span><span class="n">word_3419C</span><span class="p">)[</span><span class="o">*</span><span class="n">arg_0</span><span class="o">++</span><span class="p">]);</span>
        <span class="n">sub_2171A</span><span class="p">();</span>
        <span class="n">arg_0</span> <span class="o">+=</span> <span class="mi">2</span><span class="p">;</span>
        <span class="k">while</span> <span class="p">(</span><span class="o">*</span><span class="n">arg_0</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">var_351</span> <span class="o">=</span> <span class="n">arg_0</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">];</span>
            <span class="n">var_353</span> <span class="o">=</span> <span class="n">arg_0</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">];</span>
            <span class="n">var_352</span> <span class="o">=</span> <span class="o">*</span><span class="n">arg_0</span><span class="o">++</span><span class="p">;</span>
            <span class="n">var_354</span> <span class="o">=</span> <span class="o">*</span><span class="n">arg_0</span><span class="o">++</span><span class="p">;</span>
            <span class="n">sub_2189C</span><span class="p">();</span>
        <span class="p">}</span>
        <span class="n">sub_21704</span><span class="p">();</span>
        <span class="n">arg_0</span><span class="o">++</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">}</span></code></pre></figure>

<p>Rebuilding the reconstruction and running it through verification fails, but oddily enough not in the routine which was just reconstructed, but a different one:</p>

<pre>
<r>WARNING: Unable to determine location of routine sub_11841 in target executable. Last resort pattern searching found likely location 0000:7f95/007f95, but it may be completely wrong so false negative or positive is possible!</r>
--- Now @0000:1841/001841, routine 0000:1841-0000:18d4[000094]: sub_11841 [near] [complete], block 001841-0018d4[000094], target @0000:7f95/007f95
0000:1841/001841: push bp                          == 0000:7f95/007f95: push bp
0000:1842/001842: mov bp, sp                       == 0000:7f96/007f96: mov bp, sp
0000:1844/001844: sub sp, 0x4                      =~ 0000:7f98/007f98: sub sp, 0xc
0000:1847/001847: push di                          == 0000:7f9b/007f9b: push di
0000:1848/001848: push si                          == 0000:7f9c/007f9c: push si
0000:1849/001849: cmp word [0xe46], 0xff           ~~ 0000:7f9d/007f9d: cmp word [0xe8a], 0x0 ; var_116 / ?
0000:184e/00184e: jz 0x18cf (0x81 down)            ~= 0000:7fa2/007fa2: jz 0x7fad (0xb down)
<r>0000:1850/001850: mov word [bp-0x02], 0x0          != 0000:7fa4/007fa4: mov ax, [0xe76]</r>
</pre>

<p>The warning was actually not originally there, which made this significantly more mysterious. In general, <code class="language-plaintext highlighter-rouge">mzdiff</code> adds code locations to its scan queue based on calls and jumps it sees while comparing opcodes. In this case, it never saw a call to <code class="language-plaintext highlighter-rouge">sub_11841</code>, but it was present in the map, so its address was added to the scan queue, but without a corresponding target location in the reconstructed executable. Then, just as the warning says, it attempted a last resort match by scanning for the exact opcode bytes from the target, increasing the number of the bytes until all matches except one was found. In this case, this process yielded the location <code class="language-plaintext highlighter-rouge">0x7f95</code>, but this is a red herring. I’m not exactly sure what routine this is, because it came from assembly and is not public, so I can’t see it in the linker map, but it doesn’t really matter. What I can see is the real <code class="language-plaintext highlighter-rouge">sub_11841</code> though:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ninja@RYZEN:f15se2-re$ grep sub_11841 build/egame.map
 0000:327A       _sub_11841
</code></pre></div></div>

<p>Let’s manually set the comparison offsets with mzdiff to check this routine:</p>

<pre>
ninja@RYZEN:f15se2-re$ mzdiff --verbose --loose bin/egame.exe:0x1841 build/egame.exe:0x327a
Comparing code between reference (entrypoint 0000:1841/001841) and target (entrypoint 0000:327a/00327a) executables
New comparison location 0000:1841/001841, queue size = 0
--- Comparing reference @ 0000:1841/001841 to target @0000:327a/00327a
WARNING: Unable to find target entrypoint for routine unknown
0000:1841/001841: push bp                          == 0000:327a/00327a: push bp
0000:1842/001842: mov bp, sp                       == 0000:327b/00327b: mov bp, sp
0000:1844/001844: sub sp, 0x4                      == 0000:327d/00327d: sub sp, 0x4
0000:1847/001847: push di                          == 0000:3280/003280: push di
0000:1848/001848: push si                          == 0000:3281/003281: push si
0000:1849/001849: cmp word [0xe46], 0xff           ~= 0000:3282/003282: cmp word [0xe88], 0xff
0000:184e/00184e: jz 0x18cf (0x81 down)            != 0000:3287/003287: jnz 0x328c (0x5 down) ; BOOM!
<r>ERROR: Instruction mismatch in routine unknown at 0000:184e/00184e: jz 0x18cf != 0000:3287/003287: jnz 0x328c</r>
--- Context information for up to 10 additional instructions of routine unknown after mismatch location:
0000:1850/001850: mov word [bp-0x02], 0x0          != 0000:3289/003289: jmp 0x3314 (0x8b down)
0000:1855/001855: jmp short 0x185a (0x5 down)      != 0000:328c/00328c: mov word [bp-0x02], 0x0
0000:1857/001857: inc [bp-0x02]                    != 0000:3291/003291: jmp short 0x3296 (0x5 down)
0000:185a/00185a: cmp word [bp-0x02], 0x8          != 0000:3293/003293: inc [bp-0x02]
0000:185e/00185e: jge 0x187f (0x21 down)           != 0000:3296/003296: cmp word [bp-0x02], 0x8
0000:1860/001860: mov si, [bp-0x02]                != 0000:329a/00329a: jge 0x32bb (0x21 down)
0000:1863/001863: mov cl, 0x3                      != 0000:329c/00329c: mov si, [bp-0x02]
0000:1865/001865: shl si, cl                       != 0000:329f/00329f: mov cl, 0x3
0000:1867/001867: add word [si+0x0b56], 0xa        != 0000:32a1/0032a1: shl si, cl
0000:186c/00186c: mov ax, [si+0x0b56]              != 0000:32a3/0032a3: add word [si+0x0b98], 0xa
</pre>

<p>The early difference made the opcode scanning last resort heuristic reject this as a viable comparison location for <code class="language-plaintext highlighter-rouge">sub_11841</code>. The original <code class="language-plaintext highlighter-rouge">jz +0x81</code> turned into <code class="language-plaintext highlighter-rouge">jnz +5; jmp +0x8b</code>, because the destination for the jump went out of bounds for the relative 8bit jump of the <code class="language-plaintext highlighter-rouge">jz</code> instruction. Let’s try following this thread by again setting the comparison offsets and nudge mzdiff past this difference, which is just an artifact of a later discrepancy. This actually lets me show off the latest addition to mzdiff which is the display of symbol names from the maps of the compared executables in comments. This is useful since after ditching the manual, IDA-based reconstruction process we don’t have comments in the C code denoting the assembly offsets that I was relying on until now to find my way around the assembly - now the surest way is to look at the symbol names and try to correlate the problematic section to C code that way.</p>

<pre>
ninja@RYZEN:f15se2-re$ mzdiff --verbose --loose bin/egame.exe:0x1850 build/egame.exe:0x328c --map map/egame.map --tmap build/egame.map:link
Loading target map from build/egame.map, tag: link
Comparing code between reference (entrypoint 0000:1850/001850) and target (entrypoint 0000:328c/00328c) executables
New comparison location 0000:1850/001850, queue size = 0
--- Now @0000:1850/001850, routine 0000:1841-0000:18d4[000094]: sub_11841 [near] [complete], block 001841-0018d4[000094], target @0000:328c/00328c
0000:1850/001850: mov word [bp-0x02], 0x0          == 0000:328c/00328c: mov word [bp-0x02], 0x0
0000:1855/001855: jmp short 0x185a (0x5 down)      == 0000:3291/003291: jmp short 0x3296 (0x5 down)
0000:1857/001857: inc [bp-0x02]                    == 0000:3293/003293: inc [bp-0x02]
0000:185a/00185a: cmp word [bp-0x02], 0x8          == 0000:3296/003296: cmp word [bp-0x02], 0x8
0000:185e/00185e: jge 0x187f (0x21 down)           == 0000:329a/00329a: jge 0x32bb (0x21 down)
0000:1860/001860: mov si, [bp-0x02]                == 0000:329c/00329c: mov si, [bp-0x02]
0000:1863/001863: mov cl, 0x3                      == 0000:329f/00329f: mov cl, 0x3
0000:1865/001865: shl si, cl                       == 0000:32a1/0032a1: shl si, cl
0000:1867/001867: add word [si+0x0b56], 0xa        ~= 0000:32a3/0032a3: add word [si+0x0b98], 0xa ; var_90 / ?
0000:186c/00186c: mov ax, [si+0x0b56]              =~ 0000:32a8/0032a8: mov ax, [si+0x0b98] ; var_90 / ?
0000:1870/001870: mov cl, 0x9                      == 0000:32ac/0032ac: mov cl, 0x9
0000:1872/001872: sar ax, cl                       == 0000:32ae/0032ae: sar ax, cl
0000:1874/001874: add [si+0x0b54], ax              ~= 0000:32b0/0032b0: add [si+0x0b96], ax ; var_89 / ?
0000:1878/001878: add byte [si+0x0b59], 0x6        ~= 0000:32b4/0032b4: add byte [si+0x0b9b], 0x6 ; var_92 / ?
0000:187d/00187d: jmp short 0x1857 (0x26 up)       == 0000:32b9/0032b9: jmp short 0x3293 (0x26 up)
0000:187f/00187f: test byte [0xe38], 0xf           ~= 0000:32bb/0032bb: test byte [0xe7a], 0xf ; var_109 / word_336E8
0000:1884/001884: jnz 0x18cf (0x4b down)           ~= 0000:32c0/0032c0: jnz 0x3314 (0x54 down)
0000:1886/001886: mov ax, [0xe38]                  =~ 0000:32c2/0032c2: mov ax, [0xe7a] ; var_109 / word_336E8
0000:1889/001889: mov cl, 0x4                      == 0000:32c5/0032c5: mov cl, 0x4
0000:188b/00188b: sar ax, cl                       == 0000:32c7/0032c7: sar ax, cl
0000:188d/00188d: and ax, 0x7                      == 0000:32c9/0032c9: and ax, 0x7
0000:1890/001890: mov [bp-0x04], ax                == 0000:32cc/0032cc: mov [bp-0x04], ax
0000:1893/001893: mov si, ax                       == 0000:32cf/0032cf: mov si, ax
0000:1895/001895: mov cl, 0x3                      == 0000:32d1/0032d1: mov cl, 0x3
0000:1897/001897: shl si, cl                       == 0000:32d3/0032d3: shl si, cl
0000:1899/001899: mov di, [0xe46]                  =~ 0000:32d5/0032d5: mov di, [0xe88] ; var_116 / word_336F6
0000:189d/00189d: mov cl, 0x4                      == 0000:32d9/0032d9: mov cl, 0x4
0000:189f/00189f: shl di, cl                       == 0000:32db/0032db: shl di, cl
0000:18a1/0018a1: mov ax, [di-0x7e52]              =~ 0000:32dd/0032dd: mov ax, [di-0x7b26] ; var_761 / stru_3AA5E
0000:18a5/0018a5: mov [si+0x0b52], ax              ~= 0000:32e1/0032e1: mov [si+0x0b94], ax ; var_88 / stru_33402
0000:18a9/0018a9: mov ax, [di-0x7e50]              =~ 0000:32e5/0032e5: mov ax, [di-0x7b24] ; var_762 / ?
0000:18ad/0018ad: mov [si+0x0b54], ax              ~= 0000:32e9/0032e9: mov [si+0x0b96], ax ; var_89 / ?
0000:18b1/0018b1: mov word [si+0x0b56], 0x80       ~= 0000:32ed/0032ed: mov word [si+0x0b98], 0x80 ; var_90 / ?
0000:18b7/0018b7: mov ax, 0x100                    == 0000:32f3/0032f3: mov ax, 0x100
0000:18ba/0018ba: push ax                          == 0000:32f6/0032f6: push ax
0000:18bb/0018bb: call 0xd200 (0xb945 down)        ~= 0000:32f7/0032f7: call 0x4000 (0xd09 down) ; randlmul / sub_1D200
0000:18be/0018be: add sp, 0x2                      == 0000:32fa/0032fa: add sp, 0x2
0000:18c1/0018c1: mov ch, al                       == 0000:32fd/0032fd: mov ch, al
0000:18c3/0018c3: sub cl, cl                       == 0000:32ff/0032ff: sub cl, cl
0000:18c5/0018c5: mov [si+0x0b58], cx              != 0000:3301/003301: mov bx, [bp-0x04] ; var_91 / ? ; 💥💥💥
<r>ERROR: Instruction mismatch in routine sub_11841 at 0000:18c5/0018c5: mov [si+0x0b58], cx != 0000:3301/003301: mov bx, [bp-0x04]</r>
--- Context information for up to 10 additional instructions of routine sub_11841 after mismatch location:
0000:18c9/0018c9: mov ax, [bp-0x04]                != 0000:3304/003304: mov ax, cx
0000:18cc/0018cc: mov [0xb92], ax                  != 0000:3306/003306: mov cl, 0x3
0000:18cf/0018cf: pop si                           != 0000:3308/003308: shl bx, cl
0000:18d0/0018d0: pop di                           != 0000:330a/00330a: mov [bx+0x0b9a], ax
0000:18d1/0018d1: mov sp, bp                       != 0000:330e/00330e: mov ax, [bp-0x04]
0000:18d3/0018d3: pop bp                           != 0000:3311/003311: mov [0xbd4], ax
0000:18d4/0018d4: ret                              != 0000:3314/003314: pop si
</pre>

<p>It seems like for some reason the compiler decided to recalculate the location of an item in an array of structs, which it already had in <code class="language-plaintext highlighter-rouge">si</code>. This is a good time to actually look at the relevant C code:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="kt">int</span> <span class="nf">sub_11841</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">p</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">a</span><span class="p">;</span>

    <span class="k">if</span> <span class="p">(</span><span class="n">word_336F6</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">for</span> <span class="p">(</span><span class="n">p</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">p</span> <span class="o">&lt;</span> <span class="mi">8</span><span class="p">;</span> <span class="n">p</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
            <span class="p">((</span><span class="k">struct</span> <span class="nc">struc_9</span> <span class="o">*</span><span class="p">)</span><span class="n">stru_33402</span><span class="p">)[</span><span class="n">p</span><span class="p">].</span><span class="n">field_4</span> <span class="o">+=</span> <span class="mh">0x0a</span><span class="p">;</span>
            <span class="p">((</span><span class="k">struct</span> <span class="nc">struc_9</span> <span class="o">*</span><span class="p">)</span><span class="n">stru_33402</span><span class="p">)[</span><span class="n">p</span><span class="p">].</span><span class="n">field_2</span> <span class="o">+=</span> <span class="p">((</span><span class="k">struct</span> <span class="nc">struc_9</span> <span class="o">*</span><span class="p">)</span><span class="n">stru_33402</span><span class="p">)[</span><span class="n">p</span><span class="p">].</span><span class="n">field_4</span> <span class="o">&gt;&gt;</span> <span class="mi">9</span><span class="p">;</span>
            <span class="o">*</span><span class="p">(((</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="p">((</span><span class="k">struct</span> <span class="nc">struc_9</span> <span class="o">*</span><span class="p">)</span><span class="n">stru_33402</span><span class="p">)[</span><span class="n">p</span><span class="p">].</span><span class="n">field_6</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+=</span> <span class="mi">6</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="p">((</span><span class="kt">char</span><span class="p">)</span><span class="n">word_336E8</span> <span class="o">&amp;</span> <span class="mh">0x0f</span><span class="p">))</span> <span class="p">{</span>
            <span class="n">a</span> <span class="o">=</span> <span class="p">(</span><span class="n">word_336E8</span> <span class="o">&gt;&gt;</span> <span class="mi">4</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mi">7</span><span class="p">;</span>
            <span class="p">((</span><span class="k">struct</span> <span class="nc">struc_9</span> <span class="o">*</span><span class="p">)</span><span class="n">stru_33402</span><span class="p">)[</span><span class="n">a</span><span class="p">].</span><span class="n">field_0</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="n">int16</span> <span class="o">*</span><span class="p">)((</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="n">stru_3AA5E</span> <span class="o">+</span> <span class="n">word_336F6</span> <span class="o">*</span> <span class="mi">16</span><span class="p">);</span>
            <span class="p">((</span><span class="k">struct</span> <span class="nc">struc_9</span> <span class="o">*</span><span class="p">)</span><span class="n">stru_33402</span><span class="p">)[</span><span class="n">a</span><span class="p">].</span><span class="n">field_2</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="n">int16</span> <span class="o">*</span><span class="p">)((</span><span class="kt">char</span> <span class="o">*</span><span class="p">)</span><span class="n">stru_3AA5E</span> <span class="o">+</span> <span class="n">word_336F6</span> <span class="o">*</span> <span class="mi">16</span> <span class="o">+</span> <span class="mi">2</span><span class="p">);</span>
            <span class="p">((</span><span class="k">struct</span> <span class="nc">struc_9</span> <span class="o">*</span><span class="p">)</span><span class="n">stru_33402</span><span class="p">)[</span><span class="n">a</span><span class="p">].</span><span class="n">field_4</span> <span class="o">=</span> <span class="mh">0x80</span><span class="p">;</span>
            <span class="p">((</span><span class="k">struct</span> <span class="nc">struc_9</span> <span class="o">*</span><span class="p">)</span><span class="n">stru_33402</span><span class="p">)[</span><span class="n">a</span><span class="p">].</span><span class="n">field_6</span> <span class="o">=</span> <span class="n">sub_1D200</span><span class="p">(</span><span class="mh">0x100</span><span class="p">)</span> <span class="o">&lt;&lt;</span> <span class="mi">8</span><span class="p">;</span> <span class="c1">// pointer to `stru_33402[a]` recalculated here for some reason</span>
            <span class="n">word_33442</span> <span class="o">=</span> <span class="n">a</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span></code></pre></figure>

<p>This is not pretty, because it came from decompilation and has not been refactored yet. But here’s the mind blowing stuff: this code matched before <code class="language-plaintext highlighter-rouge">sub_160D3</code> was reconstructed above it. Then it stopped matching. Then, after <code class="language-plaintext highlighter-rouge">sub_160D3</code> was moved to the end of the file, <code class="language-plaintext highlighter-rouge">sub_11841</code> matched again.</p>

<p>In other words, the assumption that routines are processed by the compiler in isolation, and no routine can influence another, has been shattered. Clearly, there is some potential for state to propagate between routines (within the same C file) and influence the way they are put together by the compiler. Sorry, no matter how many times I say it, it still strikes me. When reconstructing code for MS C, your efforts may be derailed by invisible factors and no amount of beating your head against the wall will help.</p>

<p>Now, it’s not entirely all bad. There are some hints to what may be happening. @AJenbo had his LLM minions analyze this and the takeaway seems to be that it has to do with referencing far symbols, with 2+ references in a preceeding routine triggering this effect. It was described as a register spill by the LLM, which I don’t think is entirely accurate (a spill is when the compiler writes a register value to temporary storage on the stack when it runs out of registers), in fact it’s actually the reverse where a register value that could be reused, isn’t and the compiler feels obligated to flush the value and recalculate.</p>

<p>Also, adding routines which only reference near symbols appears to reset the state. I’m not sure how accurate this analysis really is, because it seems like the order the compiler sees globals in could also be a factor, but at least it seems fiddling with the order of routines (and/or globals?) can fix this. Additionally, I think the size of the C source file might be a factor. The happened in an almost ~2k LOC file, and I’m wondering whether the compiler isn’t running into memory pressure with big files and there’s no room for the symbol table anymore. Remember, this is 1989 and extended memory isn’t really popular in DOS, so the compiler needs to fit itself, the source and everything else in conventional memory (640k). Also suspicious is the fact that compiling the file yielded this warning:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>egame1.c(1920) : warning C4073: scoping too deep, deepest scoping merged when debugging
</code></pre></div></div>

<p>The warning <code class="language-plaintext highlighter-rouge">C4073</code> is documented as following in the compiler documentation:</p>

<blockquote>
  <p>Declarations appeared at a static nesting level greater than 13 . As a result, all declarations will seem to appear at the same level. (1)</p>
</blockquote>

<p>It seems like the compiler seems to believe the code reached the thirteenth level of nesting at the last line of the file (1920). There’s definitely something weird happening here. As an experiment, I took half of the routines from <code class="language-plaintext highlighter-rouge">egame1.c</code> and moved them to a different, almost empty file, while keeping the “bad” order that triggered the issue. The warning disappeared and the code started matching again. So maybe it has to do with file size after all?</p>

<p>Another question is why I never saw this when reconstructing <code class="language-plaintext highlighter-rouge">start.exe</code>, and why we didn’t hit it with the <code class="language-plaintext highlighter-rouge">end.exe</code> reconstructed by @AJenbo. Both are smaller that <code class="language-plaintext highlighter-rouge">egame.exe</code> which might be a factor. Might also have been lucky. Besides, for <code class="language-plaintext highlighter-rouge">start.exe</code> I think I mostly kept the routines in the same order as they were present in the original executable, which might have mitigated the issue, or made it exhibit in the same locations where it did in the original. So that’s an another possibility for a mitigation.</p>

<p>Bottom line, when facing compiler entropy from MS C, try the following:</p>

<ol>
  <li>Try to organize the routines in the same order as in the original file being reconstructed.</li>
  <li>Split large C files into smaller ones</li>
  <li>Move routines and/or global variables around, particularly paying attention to near vs far symbol usage. But this is brittle and can break again after adding more routines.</li>
</ol>

<p>So I guess that’s it. As I’m writing this, some critical bugs have been fixed in <code class="language-plaintext highlighter-rouge">egame.exe</code> by @AJenbo and it looks pretty playable now. We still have a couple routines to complete, and a few bugs, but I’m confident these will be possible to overcome.</p>]]></content><author><name></name></author><category term="f15-se2" /><summary type="html"><![CDATA[(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)]]></summary></entry><entry><title type="html">More progress on EGAME</title><link href="/f15-se2/2025/04/07/egame10.html" rel="alternate" type="text/html" title="More progress on EGAME" /><published>2025-04-07T00:00:00+00:00</published><updated>2025-04-07T00:00:00+00:00</updated><id>/f15-se2/2025/04/07/egame10</id><content type="html" xml:base="/f15-se2/2025/04/07/egame10.html"><![CDATA[<p><small>(<em>This post is part of a <a href="/category/f15-se2.html">series</a> on the subject of my hobby project, which is recreating the C source code for the 1989 game <a href="/f15-se2/2022/06/05/origins.html">F-15 Strike Eagle II</a> by reverse engineering the original binaries.</em>)</small></p>

<p>I am happy to say the reconstruction for <code class="language-plaintext highlighter-rouge">EGAME.EXE</code> is progressing smoothly. I was planning to do an update after reaching 10% completion, but I work in increments of routines, and finishing the previous one only took me past 9%, so I needed to do one more before celebrating. It was a pretty hefty one (<code class="language-plaintext highlighter-rouge">sub_18E50</code>, at least called that for now), so I’m happy to report we are actually sitting around 14%, with about 10k of opcodes transcribed into C and 52k to go:</p>

<pre>
--- Routine map stats (static):
Load module of executable is 167792/0x28f70 bytes
Routine map of 400 routines covers 71986/0x11932 bytes (42% of the load module)
Reachable code totals 73063/0x11d67 bytes (101% of the mapped area)
Unreachable code totals 202/0xca bytes (0% of the mapped area)
Excluded 122 routines take 5104/0x13f0 bytes (7% of the mapped area)
Reachable area of excluded routines is 5281/0x14a1 bytes (7% of the reachable area)
--- Comparison run stats (dynamic):
Seen 82 routines, visited 49 and compared 9662/0x25be bytes of opcodes inside (13% of the reachable area)
Ignored (seen but excluded) 33 routines totaling 1122/0x462 bytes (1% of the reachable area)
Practical coverage (visited &amp; compared + ignored) is 10784/0x2a20 (<g>14%</g> of the reachable area)
Theoretical(*) coverage (visited &amp; compared + reachable excluded area) is 14943/0x3a5f (<g>20%</g> of the reachable area)
Missed (not seen and not excluded) 229 routines totaling 52517/0xcd25 bytes (<r>72%</r> of the covered area)
(*) Any routines called only by ignored routines have not been seen and will lower the practical score,
    but theoretically if we stepped into ignored routines, we would have seen and ignored any that were excluded.
</pre>

<p>Mind that some of that 52k will turn out to be assembly so not target for the reconstruction per se, but will need porting to C anyway. So ultimately, more work, just later.</p>

<p>Seeing as it’s been almost exactly a month since I’ve started actually <a href="/f15-se2/2025/03/06/back-egame.html">transcribing the code</a>  (it was a lot of preparatory work to get the new executable building in my framework before that), that means that I’m roughly able to do 10k in a month, so theoretically if I went all in, I could probably finish the transcription within 5-6 months. Incidentally, this 10k/month lines up with what Fabien Sanglard based his calculations on when considering whether <a href="https://fabiensanglard.net/reverse_engineering_strike_commander/index.php">reversing Strike Commander</a> would be possible, before deciding it would take too many years. I think he might have been a bit pessimistic, because of those binaries’ size, a lot is bound to be data, some might be libc… But I’m still happy my game is smaller. 😁</p>

<p>So, will <code class="language-plaintext highlighter-rouge">EGAME</code> be reconstructed around October? I really doubt it, it’s probably going to take a year or more. I actually plan to take it easy for a while, with summer around the corner and more family functions and activities that inevitably brings. I also want to play a few games that I bought on Steam a while back but didn’t even have time to check out. I’m not likely to stop completely, I’m just managing expectations (my own, mostly 😉) to clarify that I’m not going to be able to maintain this pace. But the future is looking bright, so don’t fret.</p>

<p>With that out of the way, let’s look at some interesting code snippets that have popped up while doing the reconstruction.</p>

<h2 id="bxsi">[bx+si]</h2>

<p>I found this in the routine which loads <code class="language-plaintext highlighter-rouge">.3dt</code> (terrain) files. I don’t understand the format for now, but these functions will go a long way towards figuring it out one day.</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm">    <span class="nf">mov</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="p">[</span><span class="nb">bp</span><span class="o">+</span><span class="nv">var_8</span><span class="p">]</span>
    <span class="c1">; ...</span>
    <span class="nf">mov</span>	    <span class="nb">si</span><span class="p">,</span>	<span class="p">[</span><span class="nb">bp</span><span class="o">+</span><span class="nv">var_4</span><span class="p">]</span>
    <span class="nf">mov</span>	    <span class="nb">cl</span><span class="p">,</span>	<span class="mi">6</span>
    <span class="nf">shl</span>	    <span class="nb">si</span><span class="p">,</span>	<span class="nb">cl</span>
    <span class="nf">mov</span>	    <span class="nb">bx</span><span class="p">,</span>	<span class="nb">ax</span>
    <span class="nf">shl</span>	    <span class="nb">bx</span><span class="p">,</span>	<span class="mi">1</span>
    <span class="nf">mov</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="p">[</span><span class="nb">bp</span><span class="o">+</span><span class="nv">var_6</span><span class="p">]</span>
    <span class="nf">add</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="nv">offset</span> <span class="nv">buf1_3dt</span>
    <span class="nf">mov</span>	    <span class="kt">word</span><span class="nv">_1234</span><span class="p">[</span><span class="nb">bx</span><span class="o">+</span><span class="nb">si</span><span class="p">],</span>	<span class="nb">ax</span>
    <span class="nf">mov</span>	    <span class="p">[</span><span class="nb">bp</span><span class="o">+</span><span class="nv">var_A</span><span class="p">],</span>	<span class="mi">0</span></code></pre></figure>

<p>After putting one index into <code class="language-plaintext highlighter-rouge">si</code> and another into <code class="language-plaintext highlighter-rouge">bx</code>, the code reaches into what looks like an array of integers using the <code class="language-plaintext highlighter-rouge">[bx+si]</code> memory addressing mode. But why didn’t it just build the index in one register? I tried various ways of writing the indexing expression before realizing this is actually a 2-dimensional array. The first shift by 6 is multiplying the column value in <code class="language-plaintext highlighter-rouge">var_4</code> by 64, which is the size of the row (32) times the size of the element (uint16: 2), then another shift on the row index multiplies it by 2 to get the final offset. It’s actually lucky that it uses a different indexing mode for matrices, else I might have had difficulties recognizing it as such. Now I can rename and declare <code class="language-plaintext highlighter-rouge">word_1234</code> as something more meaningful:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">extern</span> <span class="kt">int</span> <span class="n">matrix3dt_2</span><span class="p">[</span><span class="mi">5</span><span class="p">][</span><span class="mi">32</span><span class="p">];</span>

<span class="kt">void</span> <span class="nf">load3DT</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">arg_0</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// ...</span>
    <span class="n">matrix3dt_2</span><span class="p">[</span><span class="n">var_4</span><span class="p">][</span><span class="n">var_8</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">int16</span><span class="p">)(</span><span class="n">buf1_3dt</span> <span class="o">+</span> <span class="n">var_6</span><span class="p">);</span>
    <span class="c1">// ...</span>
<span class="p">}</span></code></pre></figure>

<h2 id="sub-sbb-and-add-and-what">sub-sbb-and-add-and-…what?</h2>

<p>This time we is in the <code class="language-plaintext highlighter-rouge">.3d3</code> file loading routine, which I suspect are 3d models. Either way:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm">    <span class="nf">mov</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="nb">si</span><span class="nv">ze3d3_2</span>
    <span class="nf">sub</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mh">800h</span>
    <span class="nf">sbb</span>	    <span class="nb">cx</span><span class="p">,</span>	<span class="nb">cx</span>
    <span class="nf">and</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="nb">cx</span>
    <span class="nf">add</span>	    <span class="nb">ah</span><span class="p">,</span>	<span class="mi">8</span>
    <span class="nf">mov</span>	    <span class="p">[</span><span class="nb">bp</span><span class="o">+</span><span class="nv">var_A</span><span class="p">],</span>	<span class="nb">ax</span></code></pre></figure>

<p>We’ve seen <code class="language-plaintext highlighter-rouge">sub-sbb-neg</code> used by the compiler before as a way to perform branchless <code class="language-plaintext highlighter-rouge">NULL</code> checks, so that’s a hint it might be trying something similar. I’m using my old trusted technique of plotting the values for the significant cases. Here, it’s clearly trying to compare something against <code class="language-plaintext highlighter-rouge">0x800</code>, so let’s pick one value below, and one above it:</p>

<table>
  <thead>
    <tr>
      <th>Instruction</th>
      <th>Value (ax=0x1234)</th>
      <th>Value (ax=0x200)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>mov ax, size3d3_2</td>
      <td>ax=0x1234</td>
      <td>ax=0x200</td>
    </tr>
    <tr>
      <td>sbb cx, cx</td>
      <td>cx=0</td>
      <td>cx=0xffff</td>
    </tr>
    <tr>
      <td>and ax, cx</td>
      <td>ax=0</td>
      <td>ax=0xfa00</td>
    </tr>
    <tr>
      <td>add ah, 8</td>
      <td>ax=0x800</td>
      <td>ax=0x200</td>
    </tr>
  </tbody>
</table>

<p>Seeing the values makes it crystal clear that it’s just clamping the value to the range <code class="language-plaintext highlighter-rouge">0-0x800</code>. This simple code matches the binary arithmetic mess exactly:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">var_A</span> <span class="o">=</span> <span class="p">(</span><span class="n">size3d3_2</span> <span class="o">&gt;=</span> <span class="mh">0x800</span><span class="p">)</span> <span class="o">?</span> <span class="mh">0x800</span> <span class="o">:</span> <span class="n">size3d3_2</span><span class="p">;</span></code></pre></figure>

<h2 id="i-really-need-a-clever-heading-for-this">I really need a clever heading for this</h2>

<p>Still in the <code class="language-plaintext highlighter-rouge">.3d3</code> routine:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm">    <span class="nf">mov</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="p">[</span><span class="nb">bp</span><span class="o">+</span><span class="nv">var_A</span><span class="p">]</span> <span class="c1">; part of an earlier calculation</span>
    <span class="nf">add</span>	    <span class="p">[</span><span class="nb">bp</span><span class="o">+</span><span class="nv">var_E</span><span class="p">],</span>	<span class="nb">ax</span>
    <span class="nf">cmp</span>	    <span class="p">[</span><span class="nb">bp</span><span class="o">+</span><span class="nv">var_10</span><span class="p">],</span> <span class="mi">0</span>
    <span class="nf">jnz</span>	    <span class="nv">short</span> <span class="nv">loc_12C56</span>
    <span class="nf">mov</span>	    <span class="nb">si</span><span class="p">,</span>	<span class="nb">si</span><span class="nv">ze3d3</span>
    <span class="nf">shl</span>	    <span class="nb">si</span><span class="p">,</span>	<span class="mi">1</span>
    <span class="nf">add</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="nv">buf3d3</span><span class="p">[</span><span class="nb">si</span><span class="p">]</span> <span class="c1">; new calculation reuses value of var_A already in ax</span>
    <span class="nf">mov</span>	    <span class="p">(</span><span class="nv">buf3d3</span><span class="o">+</span><span class="mi">2</span><span class="p">)[</span><span class="nb">si</span><span class="p">],</span> <span class="nb">ax</span></code></pre></figure>

<p>The code is pretty simple:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">var_E</span> <span class="o">+=</span> <span class="n">var_A</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">var_10</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// 2c52</span>
    <span class="n">buf3d3</span><span class="p">[</span><span class="n">size3d3</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">buf3d3</span><span class="p">[</span><span class="n">size3d3</span><span class="p">]</span> <span class="o">+</span> <span class="n">var_A</span><span class="p">;</span>
<span class="p">}</span> <span class="c1">// 2c56</span></code></pre></figure>

<p>…but the question is how to get the value of stack variable <code class="language-plaintext highlighter-rouge">var_A</code> in register <code class="language-plaintext highlighter-rouge">ax</code> from before the conditional jump to propagate to the addition operation within the conditional code. I tried in vain, but the compiler would reload the value of <code class="language-plaintext highlighter-rouge">buf3d3[si]</code> into ax, and add <code class="language-plaintext highlighter-rouge">var_A</code> to it.</p>

<p>I really don’t remember too well how I got the idea, but when all else fails, try flipping signedness. Changing the declaration of <code class="language-plaintext highlighter-rouge">buf3d3</code> into <code class="language-plaintext highlighter-rouge">extern unsigned int buf3d3[]</code> solves this one, but don’t ask me why.</p>

<h2 id="havent-had-enough-binary-arithmetic-magic-yet">Haven’t had enough binary arithmetic magic yet?</h2>

<p>This routine ostensibly processes the cases for view switching in the plane:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm">    <span class="nf">mov</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="kt">word</span><span class="nv">_330C4</span>
    <span class="nf">inc</span>	    <span class="nb">ax</span>
    <span class="nf">cwd</span>
    <span class="nf">sub</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="nb">dx</span>
    <span class="nf">sar</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mi">1</span>
    <span class="nf">mov</span>	    <span class="nb">cx</span><span class="p">,</span>	<span class="kt">word</span><span class="nv">_336E8</span>
    <span class="nf">sub</span>	    <span class="nb">cx</span><span class="p">,</span>	<span class="nb">ax</span>
    <span class="nf">dec</span>	    <span class="nb">cx</span>
    <span class="nf">and</span>	    <span class="nb">cx</span><span class="p">,</span>	<span class="mh">0Fh</span>
    <span class="nf">mov</span>	    <span class="p">[</span><span class="nb">bp</span><span class="o">+</span><span class="nv">var_E</span><span class="p">],</span>	<span class="nb">cx</span></code></pre></figure>

<p>After seeing this, I got a bit nauseous, so decided it was time I had a stimulating conversation with my pal ChatGPT. I mean, obviously we’re trying to do some operation on a doubleword, but why subtract the older half from the younger? Anyway, they told me that:</p>

<pre>
sub ax, dx
    This is the interesting part.
    Since dx is either 0x0000 (for positive ax) or 0xFFFF (for negative ax), the subtraction effectively does:
        If ax was non-negative (dx = 0x0000), then ax remains unchanged.
        If ax was negative (dx = 0xFFFF), then ax = ax - (-1) = ax + 1.
            This effectively cancels the inc ax instruction for negative values.
This adjustment ensures that the rounding behavior for division by 2 is more symmetric.
Normally, integer division truncates toward zero, but this modification makes negative numbers round more correctly toward the mathematical floor.
Without sub ax, dx, a negative odd value would round incorrectly due to simple truncation.
</pre>

<p>Are they right? Who knows, right? It sounds so smart that I’m inclined to believe it. 😉 In any case, my buddy kindly made a table of values just like I enjoy, and it looks like it really is just a way to get division with <code class="language-plaintext highlighter-rouge">sar</code> to line up – I should really know better by now that <code class="language-plaintext highlighter-rouge">ax:dx</code> does not always equal doubleword, sometimes it’s just plain division. There’s really nothing interesting to the matching code:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">var_E</span> <span class="o">=</span> <span class="p">(</span><span class="n">word_336E8</span> <span class="o">-</span> <span class="p">((</span><span class="n">word_330C4</span>  <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xf</span><span class="p">;</span></code></pre></figure>

<h2 id="divsion-came-knocking-again">Divsion came knocking again</h2>

<p>I don’t even know what this routine does (yet). Within, we have this function call with a conditional in the middle of the arguments getting pushed onto the stack.</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm">    <span class="nf">mov</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mh">0Fh</span>
    <span class="nf">push</span>    <span class="nb">ax</span>
    <span class="nf">mov</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mh">36h</span> <span class="c1">; '6'</span>
    <span class="nf">push</span>    <span class="nb">ax</span>
    <span class="nf">mov</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mh">0E4h</span> <span class="c1">; '�'</span>
    <span class="nf">push</span>    <span class="nb">ax</span>
    <span class="nf">cmp</span>	    <span class="kt">word</span><span class="nv">_380D0</span><span class="p">,</span>	<span class="mh">64h</span> <span class="c1">; 'd'</span>
    <span class="nf">jnb</span>	    <span class="nv">short</span> <span class="nv">loc_192D5</span>
    <span class="nf">mov</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="kt">word</span><span class="nv">_380D0</span>
    <span class="nf">jmp</span>	    <span class="nv">short</span> <span class="nv">loc_192E7</span>
<span class="nl">loc_192D5:</span>
    <span class="nf">mov</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="kt">word</span><span class="nv">_380D0</span>
    <span class="nf">sub</span>	    <span class="nb">dx</span><span class="p">,</span>	<span class="nb">dx</span>  <span class="c1">; extend into dx</span>
    <span class="nf">mov</span>	    <span class="nb">cx</span><span class="p">,</span>	<span class="mi">5</span> <span class="c1">; ...divide by 5</span>
    <span class="nf">div</span>	    <span class="nb">cx</span>
    <span class="nf">mov</span>	    <span class="nb">cx</span><span class="p">,</span>	<span class="nb">ax</span> 
    <span class="nf">shl</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mi">1</span> 
    <span class="nf">shl</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mi">1</span> <span class="c1">; ...multiply by 4</span>
    <span class="nf">add</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="nb">cx</span> <span class="c1">; ...add one more 🎵</span>
<span class="nl">loc_192E7:</span>
    <span class="nf">push</span>    <span class="nb">ax</span>
    <span class="nf">call</span>    <span class="nv">sub_1A183</span></code></pre></figure>

<p>The troublesome bit is the <code class="language-plaintext highlighter-rouge">add ax, cx</code>. For some reason, I kept writing it as:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">sub_1A183</span><span class="p">(</span><span class="n">word_380D0</span> <span class="o">&lt;</span> <span class="mh">0x64</span> <span class="o">?</span> <span class="n">word_380D0</span> <span class="o">:</span> <span class="p">(</span><span class="n">word_380D0</span> <span class="o">/</span> <span class="mi">5</span><span class="p">)</span> <span class="o">*</span> <span class="mi">4</span> <span class="o">+</span> <span class="p">(</span><span class="n">word_380D0</span> <span class="o">/</span> <span class="mi">5</span><span class="p">),</span> <span class="mh">0xe4</span><span class="p">,</span> <span class="mh">0x36</span><span class="p">,</span> <span class="mh">0xf</span><span class="p">);</span></code></pre></figure>

<p>This does not match, the division is repeated, before adding the value. It took a while to click: multiply it by 4, add one more time makes 5!</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">sub_1A183</span><span class="p">(</span><span class="n">word_380D0</span> <span class="o">&lt;</span> <span class="mh">0x64</span> <span class="o">?</span> <span class="n">word_380D0</span> <span class="o">:</span> <span class="p">(</span><span class="n">word_380D0</span> <span class="o">/</span> <span class="mi">5</span><span class="p">)</span> <span class="o">*</span> <span class="mi">5</span><span class="p">,</span> <span class="mh">0xe4</span><span class="p">,</span> <span class="mh">0x36</span><span class="p">,</span> <span class="mh">0xf</span><span class="p">);</span></code></pre></figure>

<p>This is as match. Looks dumb mathematically which is why I initially rejected it, but quickly remembered this is binary division, so this will strip the remainder off the value, performing rounding to a multiple of 5, essentially the equivalent of <code class="language-plaintext highlighter-rouge">word_380D0 - (word_380D0 % 5)</code>.</p>

<h2 id="relax-that-was-the-last-one">Relax, that was the last one</h2>

<p>I wanted to conclude this entry with some general discoveries. Mind that I’m not only doing the reconstruction, but investigation in IDA to properly mark variables, add declarations for automatic C header generation etc. Usually, I try not to follow the rabbit holes too deep and focus on the routine that I’m currently looking at, but sometimes a piece of information is missing, and I need to search elsewhere. At some point, I was trying to figure out the layout of some struct data, but the code in the current routine was only accessing two members out of 8, so I had to cast a wider net. This way, I finally found myself in this interesting code:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><span class="c1">; ...</span>
<span class="nl">loc_1D747:</span>
    <span class="nf">cmp</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mh">266Ch</span>
    <span class="nf">jnz</span>	    <span class="nv">short</span> <span class="nv">loc_1D74F</span>
    <span class="nf">jmp</span>	    <span class="nv">keyL_1D31D</span>
<span class="nl">loc_1D74F:</span>
    <span class="nf">jbe</span>	    <span class="nv">short</span> <span class="nv">loc_1D754</span>
    <span class="nf">jmp</span>	    <span class="nv">loc_1D7EE</span>
<span class="nl">loc_1D754:</span>
    <span class="nf">cmp</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mh">1970h</span>
    <span class="nf">jnz</span>	    <span class="nv">short</span> <span class="nv">loc_1D75C</span>
    <span class="nf">jmp</span>	    <span class="nv">keyP_1D605</span>
<span class="nl">loc_1D75C:</span>
    <span class="nf">ja</span>	    <span class="nv">short</span> <span class="nv">loc_1D79E</span>
    <span class="nf">cmp</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mh">1177h</span>
    <span class="nf">jnz</span>	    <span class="nv">short</span> <span class="nv">loc_1D766</span>
    <span class="nf">jmp</span>	    <span class="nv">keyW_1D5AF</span>
<span class="nl">loc_1D766:</span>
    <span class="nf">ja</span>	    <span class="nv">short</span> <span class="nv">loc_1D77B</span>
    <span class="nf">cmp</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mh">11Bh</span>
    <span class="nf">jnz</span>	    <span class="nv">short</span> <span class="nv">loc_1D770</span>
    <span class="nf">jmp</span>	    <span class="nv">keyEsc_1D6B6</span>
<span class="nl">loc_1D770:</span>
    <span class="nf">cmp</span>	    <span class="nb">ax</span><span class="p">,</span>	<span class="mh">0E08h</span>
    <span class="nf">jnz</span>	    <span class="nv">short</span> <span class="nv">loc_1D778</span>
    <span class="nf">jmp</span>	    <span class="nv">keyBkspc_1D641</span>
<span class="c1">; ...</span></code></pre></figure>

<p>This is a long series of checks of ax against seemingly arbitrary values which was originally probably a <code class="language-plaintext highlighter-rouge">switch</code> statement, but I realized these were key <a href="https://stanislavs.org/helppc/scan_codes.html">scan codes</a>. This is the routine for dispatching keypresses! This is immensely helpful, because I know what the keys do in the game, so I can infer (even if broadly) the purpose of the routines that are invoked by the cases. I did some initial poking around and found a bunch of interesting avenues for further research. As it is with IDA, figuring out one bit in one place will sometimes unlock whole other areas for you, and you keep on doing that until you’re done.</p>

<h2 id="fat-routines">Fat routines</h2>

<p>I wanted to talk about are the sizes of the routines I’ve been encountering. Some are pretty significant. Last one I finished was almost 1600 bytes of code, which is not dramatic, but not trivial either, especially as it has multiple nested conditions and loops inside, which reminds me about how I used to need to do <a href="/f15-se2/2024/05/05/ghidra.html">desperate stuff</a> to get the control flow down. The current routine that I am supposed to do is over 4700 bytes! It’s one of the key dispatch routines I was excited about (there are actually two separate ones, not sure why yet). Anyway, this hints at maybe why why <a href="/f15-se2/2025/02/27/elephant.html">could not find</a> many routines from <em>Fleet Defender</em> in the F15SE2 code - a lot of this code appears to be huge, sprawling routines that cover many aspects of the game logic. Perhaps at some point there was a refactor of this codebase into more manageable, smaller bits, which would not be found with routine signatures, and would be little help in reconstructing the code even if we <em>could</em> find them. Perhaps the old, ugly code was decided to be too F15-specific, so it was thrown away at some point in time of the codebase’s lifetime as cruft, and did not carry over to either F15SE3, or some other step along the way. I’ll probably never know, but this kind of makes sense and closes down the quest for duplicate searching against the F-14 code leak.</p>

<h2 id="transcription-101">Transcription 101</h2>

<p>By the way, how am I managing to write all this convoluted code without getting confused? Recently, while doing the reconstruction, I started following a sort-of formalized approach to transcribing the code. It’s a pretty small detail, but enough of a gamechanger for me that I wanted to mention it. Basically, while working on a routine, I will write out hex offsets of the corresponding assembly opcodes as comments in the source code. Not for every line, but it’s very helpful if I ever need to go back from an offset to a place in the C code, which pops up pretty often. Anyway, this isn’t new, I’ve been doing it since the beginning. What I started doing was writing an offset comment on every opening and closing brace, and closing the braces immediately when I encounter them:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="kt">int</span> <span class="nf">sub_18E50</span><span class="p">(</span><span class="kt">int</span> <span class="n">arg_0</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">var_2</span><span class="p">,</span> <span class="n">var_4</span><span class="p">,</span> <span class="n">var_6</span><span class="p">,</span> <span class="n">var_8</span><span class="p">,</span> <span class="n">var_A</span><span class="p">,</span> <span class="n">var_C</span><span class="p">,</span> <span class="n">var_E</span><span class="p">,</span> <span class="n">var_10</span><span class="p">,</span> <span class="n">var_12</span><span class="p">,</span> <span class="n">var_14</span><span class="p">,</span> <span class="n">var_16</span><span class="p">,</span> <span class="n">var_18</span><span class="p">,</span> <span class="n">var_1A</span><span class="p">;</span>
    <span class="kt">char</span> <span class="n">var_1C</span><span class="p">;</span>
    <span class="n">byte_3C5A0</span> <span class="o">=</span> <span class="n">gfx_jump_2d</span><span class="p">();</span>
    <span class="n">var_16</span> <span class="o">=</span> <span class="n">waypoints</span><span class="p">[</span><span class="n">waypointIndex</span><span class="p">].</span><span class="n">field_0</span> <span class="o">-</span> <span class="n">word_3BEC0</span><span class="p">;</span>
    <span class="n">var_1A</span> <span class="o">=</span> <span class="n">waypoints</span><span class="p">[</span><span class="n">waypointIndex</span><span class="p">].</span><span class="n">field_2</span> <span class="o">-</span> <span class="n">word_3BED0</span><span class="p">;</span>
    <span class="c1">// 8e83</span>
    <span class="n">word_3BE92</span> <span class="o">=</span> <span class="n">sub_1D008</span><span class="p">(</span><span class="n">var_16</span><span class="p">,</span> <span class="o">-</span><span class="n">var_1A</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">word_330C2</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// 8e96</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">word_38FEA</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// 8e9d</span>
            <span class="n">word_38FEA</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
            <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">keyValue</span> <span class="o">&amp;</span> <span class="mh">0x80</span><span class="p">))</span> <span class="p">{</span> <span class="c1">// 8eaa</span>
                <span class="n">sub_19E44</span><span class="p">(</span><span class="mh">0xd</span><span class="p">);</span>
                <span class="n">sub_19E5D</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mh">0x13f</span><span class="p">,</span> <span class="mh">0x60</span><span class="p">);</span>
                <span class="n">gfx_jump_4f</span><span class="p">(</span><span class="mh">0x3c</span><span class="p">);</span>
            <span class="p">}</span> <span class="c1">// 🔵 nothing here because it's the same as below, 0x8ed2</span>
        <span class="p">}</span> <span class="c1">// 8ed2</span>
        <span class="n">byte_37C2F</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">keyValue</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">byte_37C24</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// 8eeb</span>
            <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">commData</span><span class="o">-&gt;</span><span class="n">setupUseJoy</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// 8ef9</span>
                <span class="n">sub_19E44</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
                <span class="c1">// 🟢🟢🟢 working here</span>
            <span class="p">}</span> <span class="c1">// 8fce                </span>
        <span class="p">}</span> <span class="c1">// 93c4 // 🔵 have all these marked out in advance</span>
    <span class="p">}</span> <span class="c1">// 93cf // 🔵</span>
<span class="p">}</span> <span class="c1">// 9485 // 🔵</span></code></pre></figure>

<p>Having these braces fixed at both ends as soon as I encounter the opening one lets me keep my bearings even inside pretty complex control structures. When I see a location in the IDA listing, I can immediately check if it matches some of my currently open blocks’ ending, and hence that I should move out of its scope. Likewise, I have a convention for IDA, where I rename its <code class="language-plaintext highlighter-rouge">loc_01234</code> labels to <code class="language-plaintext highlighter-rouge">if/else/endif/loop/..._01234</code> to make the listing more readable, but that’s not as important in my view, and I only do it where it’s especially difficult to figure stuff out.</p>

<p>This of course assumes unoptimized code without any nastiness like deduplication or code block reordering. But luckily enough, most of the code I’ve seen so far has been compiled with the <a href="/f15-se2/2023/09/02/compiler3.html">elusive /Zi flag</a>, so the opcodes pretty much follow the C code one-to-one. But I’m sure something will surprise me one day. Oh well, even in such case following this pattern for as long as possible lets me know something is afoot when the jump sequence does not make sense.</p>

<p>Thanks for reading, I’ll update around the 20-30% mark or if there is anything of interest.</p>]]></content><author><name></name></author><category term="f15-se2" /><summary type="html"><![CDATA[(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)]]></summary></entry><entry><title type="html">Back in the (e)game</title><link href="/f15-se2/2025/03/06/back-egame.html" rel="alternate" type="text/html" title="Back in the (e)game" /><published>2025-03-06T00:00:00+00:00</published><updated>2025-03-06T00:00:00+00:00</updated><id>/f15-se2/2025/03/06/back-egame</id><content type="html" xml:base="/f15-se2/2025/03/06/back-egame.html"><![CDATA[<p><small>(<em>This post is part of a <a href="/category/f15-se2.html">series</a> on the subject of my hobby project, which is recreating the C source code for the 1989 game <a href="/f15-se2/2022/06/05/origins.html">F-15 Strike Eagle II</a> by reverse engineering the original binaries.</em>)</small></p>

<p>This is just a short status update to let anybody interested know that after spending <a href="/f15-se2/2024/12/31/newtooling.html">too</a> <a href="/f15-se2/2025/01/29/newtooling2.html">much</a> time on improvements and bugfixes for my tooling, I’m back to reconstructing the actual game, finally branching into the next executable (<code class="language-plaintext highlighter-rouge">EGAME.EXE</code>) since the previous one is <a href="/f15-se2/2025/01/09/start-runs.html">mostly working</a>. In fact, I just finished <a href="https://github.com/neuviemeporte/f15se2-re/blob/main/src/egame0.c">reconstructing the <code class="language-plaintext highlighter-rouge">main()</code> routine</a>, so I guess this is cause to celebrate. Yay!</p>

<p>Part of the reason this took so long is the fact I was procrastinating because setting up a new executable with my tooling is a bit of a chore. What I basically did was:</p>

<ol>
  <li>Initial research in and around <code class="language-plaintext highlighter-rouge">main()</code> in IDA, named some routines and variables including the overlay driver trampoline routines, exported the listing</li>
  <li>Created a config file (<code class="language-plaintext highlighter-rouge">egame_rc.json</code>) for my listing-slicing tools, telling it to tweak segment definitions, export public symbols etc. in the autogenerated files (C header and assembly file holding the not-reconstructed routine stubs as well as the contents of the data segment)</li>
  <li>Created a new Makefile target for building <code class="language-plaintext highlighter-rouge">EGAME.EXE</code> and its dependencies, including the autogenerated files</li>
  <li>Wrote the code for <code class="language-plaintext highlighter-rouge">main()</code> using the IDA listing as guidance</li>
  <li>Ran the build and automated comparison with <code class="language-plaintext highlighter-rouge">mzdiff</code> to the original, tweaked until identical code was emitted</li>
</ol>

<p>The good news is that now that all that’s done, reconstructing subsequent routines will be much easier, and I’m sure the reconstruction work will see progress, even if it may take a while before it’s complete.</p>

<p>Here is the output from the diffing tool with some statistics at the end:</p>

<pre>
ninja@RYZEN:f15se2-re$ make verify-egame
mzretools/debug/mzdiff bin/egame.exe:0x10 build/egame.exe:[558bec83ec??c746] --verbose --loose --ctx 30 --map map/egame.map
Comparing code between reference (entrypoint 0000:0010/000010) and target (entrypoint 0000:0010/000010) executables
New comparison location 0000:0010/000010, queue size = 0
--- Now @0000:0010/000010, routine 0000:0010-0000:0146[000137]: main [near], block 000010-000146[000137], target @0000:0010/000010
0000:0010/000010: push bp                          == 0000:0010/000010: push bp
0000:0011/000011: mov bp, sp                       == 0000:0011/000011: mov bp, sp
0000:0013/000013: sub sp, 0x6                      =~ 0000:0013/000013: sub sp, 0x4
0000:0016/000016: mov word [bp-0x02], 0x0          == 0000:0016/000016: mov word [bp-0x02], 0x0
[...]
Reached end of routine block @ 0000:0146/000146
Completed comparison of routine main, no more reachable blocks
New comparison location 0000:0688/000688, queue size = 13
--- Now @0000:0688/000688, routine 0000:0688-0000:06e0[000059]: routine_14 [near], block 000688-00069a[000013], target @0000:015f/00015f
<r>0000:0688/000688: push bp                          != 0000:015f/00015f: ret</r> 
ERROR: Instruction mismatch in routine <r>routine_14</r> at 0000:0688/000688: push bp != 0000:015f/00015f: ret
[...]
--- Routine map stats (static):
Routine map of 400 routines covers 71988/0x11934 bytes (42% of the load module)
Reachable code totals 71773/0x1185d bytes (99% of the mapped area)
Unreachable code totals 1494/0x5d6 bytes (2% of the mapped area)
Excluded 63 routines take 4809/0x12c9 bytes (6% of the mapped area)
Reachable area of excluded routines is 4986/0x137a bytes (6% of the reachable area)
--- Comparison run stats (dynamic):
Seen 2 routines, visited 2 and compared 311/0x137 bytes of opcodes inside (0% of the reachable area)
Ignored (seen but excluded) 0 routines totaling 0/0x0 bytes (0% of the reachable area)
Practical coverage (visited &amp; compared + ignored) is 311/0x137 (<g>0%</g> of the reachable area)
Theoretical(*) coverage (visited &amp; compared + reachable excluded area) is 5297/0x14b1 (7% of the reachable area)
Missed (not seen and not excluded) 335 routines totaling 66779/0x104db bytes (<r>92%</r> of the covered area)
(*) Any routines called only by ignored routines have not been seen and will lower the practical score,
    but theoretically if we stepped into ignored routines, we would have seen and ignored any that were excluded.
DEBUG: Dumping visited map of size 0x7169 starting at 0x0 to tgt.visited
Building code map from search queue contents: 15 routines over 2 segments
Saving target map to map/egame.tgt
Saving code map (routines = 15) to map/egame.tgt, reversing relocation by 0x0000
Comparison result: mismatch
make: *** [Makefile:256: verify-egame] Error 1
</pre>

<p>So, as expected, <code class="language-plaintext highlighter-rouge">main()</code> was matched, and the comparison failed on <code class="language-plaintext highlighter-rouge">routine_14</code> which I need to do next. You can see that I have completed 0% of the reconstruction, although if we count ignored routines (libc and such) as completed, the completion stat climbs to 7%, so I have about 92% to go.</p>

<p>Additional good news is that the duplicate search functionality developed in the tooling was not a total waste, even if the results were not earth-shattering. Upon repeating the signatures search from <code class="language-plaintext highlighter-rouge">START.EXE</code> for <code class="language-plaintext highlighter-rouge">EGAME.EXE</code> with improved routine boundaries and smaller routines included, it detected 11% of <code class="language-plaintext highlighter-rouge">EGAME.EXE</code> as duplicate of code that I already reconstructed before, although that does include the libc functions, so it’s really just 11-7=4%. Together with the lousy 3% it found as coming from the leaked <em>Fleet Defender</em> codebase, that’s 4+3=7% that I don’t need to do, or at least not from scratch. It’s a rough approximation, but you could say I’m about 7(duplicates from <code class="language-plaintext highlighter-rouge">START.EXE</code> and <em>Fleet Defender</em>)+7(libc)=14% done without really doing much. 😉 I’m pretty sure some routines will turn up as unreachable, same as it was with <code class="language-plaintext highlighter-rouge">START.EXE</code>, so that could further limit the extent of the reconstruction. But there’s no way around it, the bulk of the work is still ahead of me. Still, it’s not as daunting as first starting out because I know much more about how the game works, I have the layout of some common structures and overlay call jump table down, so it’s “just” a matter of going through all the opcodes and writing the correct C code.</p>

<p>This is it for now, I’ll update when I come across something interesting, or if I hit a significant milestone.</p>]]></content><author><name></name></author><category term="f15-se2" /><summary type="html"><![CDATA[(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)]]></summary></entry><entry><title type="html">The F14-sized elephant in the room</title><link href="/f15-se2/2025/02/27/elephant.html" rel="alternate" type="text/html" title="The F14-sized elephant in the room" /><published>2025-02-27T00:00:00+00:00</published><updated>2025-02-27T00:00:00+00:00</updated><id>/f15-se2/2025/02/27/elephant</id><content type="html" xml:base="/f15-se2/2025/02/27/elephant.html"><![CDATA[<p><small>(<em>This post is part of a <a href="/category/f15-se2.html">series</a> on the subject of my hobby project, which is recreating the C source code for the 1989 game <a href="/f15-se2/2022/06/05/origins.html">F-15 Strike Eagle II</a> by reverse engineering the original binaries.</em>)</small></p>

<p>Yeah, I know, I kind of suggested (if not exactly promised) in the last post that I would let go of fiddling with the tooling and get back into reconstructing the source code for the game. But let’s stop kidding ourselves; there’s an elephant in the room that we need to talk about.</p>

<p><img src="/images/elephant.webp" alt="" class="center-image-narrow" /></p>

<p>The pre-release source code for Microprose’s 1994 Fleet Defender got <a href="https://archive.org/details/f-14-src">leaked</a> apparently as far back as the 90s, being passed around on BBSes. I felt kind of silly when I heard about it a few years into my F15SE2 project because I was completely unaware of it. Could it be that I had (most) of the source code available from the very beginning and just didn’t know? That would be pretty embarrassing. At least I did not go into the main flight game engine in <code class="language-plaintext highlighter-rouge">EGAME.EXE</code> yet, and if I could match the leaked source code to the game binary routines, it could shave <strong>years</strong> off my reversing efforts.</p>

<p>As far as I understand it, based on superficial similarities and information from ex-Microprose employees that were kind enough to talk to me, the late-80s-to-early-90s Microprose MS-DOS flight sim codebase was evolving and being shared between projects as new games were being developed. The origins of this codebase are somewhat murky, but Sid Meier gives some hints in his <a href="https://www.goodreads.com/book/show/50489373-sid-meier-s-memoir">book</a>:</p>

<blockquote>
  <p>(…) just a few months after <em>Red Storm Rising</em> hit shelves, I took the opportunity to return to the flight simulator genre with a game called <em>F-19 Stealth Fighter</em>. It was a half upgrade, half sequel to an existing game called <em>Project Stealth Fighter</em>, with the major distinction being that this version would be developed on the IBM personal computer. A few older games had been directly ported up to the new system, but they didn’t take advantage of the new technology; they just looked like C64 games running on a bigger machine. (…) I was intrigued by the chance to explore this topic with an entirely new code base (…)</p>
</blockquote>

<p>That makes it sound like <em>F-19</em> was largely developed from scratch even though it’s possible they picked some bits from their earlier DOS titles. The timeline of this codebase would therefore go something like this:</p>

<ol>
  <li>F-19 Stealth Fighter (1988)</li>
  <li>F-15 Strike Eagle 2 (original 1989, expansion 1991)</li>
  <li>F-117A Nighthawk Stealth Fighter 2.0 (1991)</li>
  <li>F-15 Strike Eagle III (1992)</li>
  <li>Fleet Defender (1994)</li>
</ol>

<p>There have been a bunch of MPS games released in between these dates, including some flight simulators. So, does that mean that parts of that common code are in <em>Knights of the Sky</em> (1990), <em>Gunship 2000</em> (1991), <em>B-17 Flying Fortress</em> (1992) or even <em>M1 Tank Platoon</em> (1989)? I don’t know, probably yes. I’m pretty sure parts of the video display code, sprite and overlay handling are shared even with non-sims like <em>Civilization</em> (1991). I especially don’t know what happened to the code between <em>F-117</em> and <em>F-15 III</em> because they look radically different and I’m not sure if there was an intermediate step, or did <em>F-15 III</em> just take the <em>F-117</em> code and improved on it heavily. It’s also a big unknown how the ports were handled; <em>F-19</em> was on both the Amiga and Atari ST, <em>F-15 II</em> was additionally ported to the Sega Genesis, <em>F-117</em> was likewise on the Amiga, and it was only with <em>F-15 III</em> that MPS started targeting DOS exclusively with this codebase. I wonder if any of the code written for the PC ended up in any of the ports. There were C compilers available for the Amiga like Aztec C and Lattice C, but I think the emitted code would be too inefficient for the needs of game development, and I expect that the whole thing was rewritten in assembly for any ports MPS might have done, but I could be wrong.</p>

<p>Bottom line, there’s at least 3 product generations between <em>F-15 II</em> and <em>Fleet Defender</em>. That makes it unlikely for them to be largely identical, but surely some routines for flight dynamics and/or 3D projection could have remained (mostly) unchanged? The possibility that the answers to F15SE2’s most difficult riddles might lay in the Fleet Defender code repository is something that I could not resist, so I downloaded it and started looking around.</p>

<p>There are of course ethical/legal questions involved. The source code was not officially released and is therefore for all intents and purposes, stolen. I thought long and hard about whether or not I should look at it at all, because I don’t want to compromise my project, but I finally decided that the opportunity was too great to miss. However, I decided I would under no circumstances lift any part of the code from the leaked repository into my own project. My usage of the leaked code will be limited to studying and potentially applying the conclusions to my project – if there are any to apply. The leak was of a pre-release version and the source code is incomplete (more on that below), so any harm to the sales of <em>Fleet Defender</em> would be negligible, and it’s more that 30 years in the past. These days, the leaked source code is an artifact of historical research. I think of it something like one of the works of ancient art in Berlin’s <a href="https://en.wikipedia.org/wiki/Neues_Museum">Neues Museum</a>. Next to many of these items you will find notes euphemistically saying “Acquired in 1935” or “Found in Greece in 1918”, which means they were picked up for peanuts or just straight up stolen from other countries and brought over to Germany back when hardly anybody cared about those treasures, or realized they could become valuable. Their return is a contentious matter that’s still under discussion. There’s also an argument to be made that if the Germans hadn’t taken and preserved the items, they would probably have been lost to neglect or chopped up to make road gravel. But does that mean that we can’t or shouldn’t admire them in the meantime? No, that would be an even greater waste, so I’m not going to pretend like this gem of early 3D gaming technology doesn’t exist, either.</p>

<p>With that out of the way, let’s see what we can learn from that source code. There are 79 C files, 27 header files and 17 ASM files, and the whole source repository weighs in around 4MB. The sources are about half of that, with the rest being what looks like various campaign scenario files, damage tables and aircraft stats, both in readable formats and compiled into binary equivalents, with some of the compiler tooling included in the repository. Those might be useful for modding and/or creating new missions. Several files contain C code while having odd extensions like <code class="language-plaintext highlighter-rouge">.bk1/.ori/.mik</code> – these seem backup/work copies of equivalent <code class="language-plaintext highlighter-rouge">.c</code> files for tweaking.</p>

<h2 id="a-look-at-the-code">A look at the code</h2>

<p>Here are some interesting takeouts from the <em>Fleet Defender</em> source code:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="cm">/*----------*/</span>
<span class="cm">/*
/*  File:   3DObject.C
/*
/*  Auth:   Andy Hollis  11/25/87 &lt;&lt;&lt;&lt;&lt; Amazing !!!
/*
/*  Edit:   Hacked yet again for F15 III, AWH - 8/92
/*          AND YET AGAIN FOR F-14, MJM 4/93
/*
/*  Routines to draw a 3D object. - NOT!!
/*
/*----------*/</span></code></pre></figure>

<p>Most of the files in the repository are dated 1990-1994, with the oldest ones (containing trig function value tables) from 1989, but this appears to be the oldest date mentioned in the codebase, and the comment seems to indicate the Fleet Defender dev seemed appropriately impressed. The comment appears to confirm that the F-19 codebase has been reused in multiple projects over its lifetime, and parts of it made it into <em>F-15 III</em> and <em>Fleet Defender</em>. The references to “F-15 (III)” are the most numerous in the codebase, with a few stray ocurrences of “F-19/Stealth Fighter”, but surprisingly nothing about “F-117”. The file itself contains just 5 short routines seemingly related to 3D views (or not?).</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">//***************************************************************************</span>
<span class="c1">//*</span>
<span class="c1">//*    AWG9.C</span>
<span class="c1">//*</span>
<span class="c1">//*    Author: Mike McDonald (adapted from Bill Beckers F15-III:APG-70)</span>
<span class="c1">//*</span>
<span class="c1">//*    Fleet Defender: F-14 Tomcat</span>
<span class="c1">//*    Microprose Software, Inc.</span>
<span class="c1">//*    180 Lakefront Drive</span>
<span class="c1">//*    Hunt Valley, Maryland  21030</span>
<span class="c1">//*</span>
<span class="c1">//***************************************************************************</span></code></pre></figure>

<p>Again, it looks like the <em>F-15 III</em> radar (<a href="https://en.wikipedia.org/wiki/AN/APG-63_radar_family">APG-70</a>) code was tweaked and reused for <a href="https://en.wikipedia.org/wiki/AN/AWG-9">AWG-9</a> radar in Fleet Defender. I don’t expect to find this in <em>F-15 II</em> though, whose radar is extremely simplistic and arcade-like.</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">//***************************************************************************</span>
<span class="c1">// FILE: GMAIN.C</span>
<span class="c1">//</span>
<span class="c1">// Fleet Defender - F14 Tomcat</span>
<span class="c1">// Project Manager: Scott Spanburg</span>
<span class="c1">// Revised by: Mike McDonald</span>
<span class="c1">//</span>
<span class="c1">// Adapted from F-15 Strike Eagle code by Sid Mieyer, Andy Hollis</span>
<span class="c1">//</span>
<span class="c1">//***************************************************************************</span>
<span class="c1">// [...]</span>

<span class="kt">char</span>    <span class="o">*</span><span class="n">F14CNUM</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="s">"F14R"</span> <span class="p">};</span>
<span class="kt">char</span>    <span class="o">*</span><span class="n">F14SNAME</span><span class="p">[</span><span class="mi">30</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="s">"Quality Assurance"</span> <span class="p">};</span>
<span class="kt">char</span>    <span class="o">*</span><span class="n">F14SPER</span><span class="p">[</span><span class="mi">30</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="s">"Vaughn Thomas"</span> <span class="p">};</span>
<span class="kt">char</span>    <span class="o">*</span><span class="n">F14VNUM</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span>  <span class="o">=</span> <span class="p">{</span> <span class="s">"1.18"</span> <span class="p">};</span>
<span class="c1">// [...]</span>

<span class="n">main</span><span class="p">(</span><span class="n">argc</span><span class="p">,</span> <span class="n">argv</span><span class="p">,</span> <span class="n">envp</span><span class="p">)</span>
<span class="kt">int</span>     <span class="n">argc</span><span class="p">;</span>
<span class="kt">char</span>    <span class="o">**</span><span class="n">argv</span><span class="p">;</span>
<span class="kt">char</span>    <span class="o">**</span><span class="n">envp</span><span class="p">;</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">z</span><span class="p">;</span>
    <span class="n">save_video_state</span><span class="p">();</span>
    <span class="n">InitOverlay</span><span class="p">(</span><span class="n">LoadOverlay</span><span class="p">(</span><span class="s">"Mgraphic.exe"</span><span class="p">,</span><span class="s">"Fonts.F15"</span><span class="p">));</span>
    <span class="n">TurnOnGraphicsMode</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
    <span class="n">mclear_screen</span><span class="p">();</span>
    <span class="n">mprintf</span><span class="p">(</span><span class="s">"****  DEBUGGING INFORMATION IS ON!!! - NOT FOR QA ****"</span><span class="p">);</span>
    <span class="n">InitOptions</span><span class="p">();</span>
    <span class="n">LoadSoundConfig</span><span class="p">();</span>
    <span class="n">InitSound</span><span class="p">();</span>
    <span class="n">InitGraphicPages</span><span class="p">();</span>
    <span class="n">InitGraph</span><span class="p">(</span><span class="sc">'M'</span><span class="p">);</span>
<span class="c1">// [...]</span>
<span class="p">}</span>

<span class="c1">// [...]</span>
<span class="n">MainGameLoop</span><span class="p">()</span>
<span class="p">{</span>
    <span class="n">SetJoysticks</span><span class="p">(</span><span class="n">StickType</span><span class="p">);</span>
    <span class="n">KBInit</span><span class="p">();</span>
    <span class="k">do</span> <span class="p">{</span>
        <span class="n">UpdateTime1</span><span class="p">();</span>
        <span class="n">UpdateTime2</span><span class="p">();</span>
        <span class="n">TakeInputs</span><span class="p">();</span>
        <span class="n">LocalCmds</span><span class="p">();</span>
        <span class="k">if</span> <span class="p">((</span><span class="o">--</span><span class="n">DisplayFrame</span><span class="p">)</span><span class="o">==</span><span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">GenDsp</span><span class="p">();</span>
            <span class="n">DoCockpitDisplays</span><span class="p">();</span>
            <span class="n">Messages</span><span class="p">();</span>
        <span class="p">}</span>
        <span class="n">ProcessInputs</span><span class="p">();</span>
        <span class="n">FLIGHT</span><span class="p">();</span>
        <span class="n">Stealth</span><span class="p">();</span>
        <span class="n">DoPlayerOnTheCat</span><span class="p">();</span>  <span class="c1">// keeps plane in sink with boat</span>
        <span class="n">AWG9</span><span class="p">();</span>
        <span class="n">TEWS_SYS</span><span class="p">();</span>
        <span class="n">UpdatePalette</span><span class="p">();</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">DisplayFrame</span><span class="o">==</span><span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
            <span class="n">Flip</span><span class="p">();</span>
            <span class="n">DisplayFrame</span><span class="o">=</span><span class="n">Speedy</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">BEND</span><span class="p">);</span>
        <span class="n">Release3DMemory</span><span class="p">();</span>
    <span class="n">MouseHIDE</span><span class="p">();</span>
    <span class="n">ClearPage</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">BLACK</span><span class="p">);</span>
    <span class="n">ClearPage</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">BLACK</span><span class="p">);</span>
    <span class="n">SndShutdown</span><span class="p">();</span>
    <span class="n">UnInitSpeech</span><span class="p">();</span>
    <span class="n">DumpLogFile</span><span class="p">();</span>
    <span class="n">GetRidOfKeyJoy</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>

<p>This contains the main function of the game and the main game loop itself. What’s interesting is that it was apparently adapted from <em>F-15</em>, and the fact that Sid is mentioned (who did not work on <em>F-15 III</em>) makes me think this was all the way from <em>F-15 II</em>. It’s not that big a deal because the function is rather simple, just calls into other functions and would be modified for <em>Fleet Defender</em> anyway, but it’s nevertheless cool to look at, and the routine names, structure definitions and other references might be useful. It’s interesting to see it init and load the <code class="language-plaintext highlighter-rouge">mgraphic.exe</code> video overlay driver, and I also remember setting the <code class="language-plaintext highlighter-rouge">M</code> mode value for VGA graphics in my code. The source code contains the name of Vaughn Thomas who was a tester at MPS during that time, but I don’t think that means he was the source of the leak - some developer probably created this build for Vaughn, gave the binaries to him for testing, and the source code was pilfered sometime at that point.</p>

<p>You can also see the ancient origins of the codebase in the fact that many routines are written in the pre-ANSI K&amp;R style of C, with no return type (which defaulted to <code class="language-plaintext highlighter-rouge">int</code>), and local variables defined before the opening brace.</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="cm">/************************************************************************
*                                                                       *
*       Project: Stealth Fighter(TM)                                    *
*                                                                       *
*&lt;t&gt;    Flight Equations                                                *
*                                                                       *
*       Author: Jim Synoski                                             *
*       Written: Jan 1988                                               *
*       Last Editted: Jan 22,1988                                       *
*                                                                       *
*       Copyright (C) 1988 by MicroProse Software, All Rights Reserved. *
*                                                                       *
************************************************************************/</span></code></pre></figure>

<p>This comment is present in <code class="language-plaintext highlighter-rouge">flight2.c</code> and in <code class="language-plaintext highlighter-rouge">views.c</code>, although the latter with a later date of 1993. Both are calculation-dense routines seemingly for calculating flight dynamics and 3D view processing, and the fact that it seems they originated from <em>F-19</em> makes me hope I can find traces of this code in <em>F-15 II</em> which would be super helpful.</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="cm">/*  File:   Planes.c                              */</span>
<span class="cm">/*  Author: Sid Meier                             */</span>
<span class="cm">/*                                                */</span>
<span class="cm">/*  Game logic for Stealth: enemy planes          */</span>
<span class="c1">// [...]</span>
<span class="c1">// The Strategy:</span>
<span class="c1">// Planes have three overall plans:</span>
<span class="c1">//      o PATROL means to fly amongst nearby enemy entries in the Rdrs list,</span>
<span class="c1">//        quitting when "time" runs out.  When this occurs, the nearest enemy</span>
<span class="c1">//        base is chosen as destination and the plane flys home or just</span>
<span class="c1">//        disappears.  When near the base, landing is initiated.  When landed,</span>
<span class="c1">//        the plane is deactivated.  PATROL is the default plane type.</span>
<span class="c1">//      o LOITER is similar to PATROL, except that the target is an enemy</span>
<span class="c1">//        base and never changes.  This gives infinite touch-n-go's.</span>
<span class="c1">// Overriding factors include:</span>
<span class="c1">//      o If he "pings" you, he comes after you until he loses you.</span>
<span class="c1">//      o If "detected" planes will go after your last known position</span>
<span class="c1">// Slots in the planes array are as follows:</span>
<span class="c1">//      o Last four are for CloseBase touch-n-gos</span>

<span class="c1">// [...]</span>
<span class="cm">/*  File:   Radars.c                              */</span>
<span class="cm">/*  Author: Sid Meier                             */</span>
<span class="cm">/*                                                */</span>
<span class="cm">/*  Game logic for Stealth: enemy radars          */</span>
<span class="cm">/*                                                */</span>
<span class="c1">// [...]</span>
<span class="c1">// Radar detection check</span>
<span class="n">detect</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// [...]</span>
        <span class="c1">// SAM WILL BE FIRED IF:</span>
        <span class="c1">// 1) MISSILE IS ASSOCIATED WITH RADAR INSTALLATION</span>
        <span class="c1">// 2) NOT OUT OF MISSILES</span>
        <span class="c1">// 3) RADAR ALERT LEVEL IS HIGH ENOUGH</span>
        <span class="c1">// 4) PLAYER IS NOT CLOSE TO FRIENDLY BASE</span>
        <span class="c1">// 5) PLAYER IS WITHIN MISSILE ENVELOPE</span>
<span class="c1">// [...]</span>
<span class="p">}</span>
<span class="c1">// [...]</span>
<span class="c1">// General radar detection algorithm for ground radars and planes</span>
<span class="n">Rsignal</span><span class="p">(</span><span class="n">COORD</span> <span class="n">x</span><span class="p">,</span><span class="n">COORD</span> <span class="n">y</span><span class="p">,</span><span class="kt">int</span> <span class="n">z</span><span class="p">,</span> <span class="kt">int</span> <span class="n">type</span><span class="p">,</span><span class="kt">int</span> <span class="o">*</span><span class="n">ang</span><span class="p">,</span><span class="kt">int</span> <span class="o">*</span><span class="n">dst</span><span class="p">,</span><span class="kt">int</span> <span class="n">isfriend</span><span class="p">,</span><span class="kt">int</span> <span class="o">*</span><span class="n">targetnum</span><span class="p">)</span></code></pre></figure>

<p>Likewise, these two files credited to Sid and seemingly originating from <em>F-19</em> seem like they might carry over to <em>F-15 II</em>. <code class="language-plaintext highlighter-rouge">planes.c</code> is over 6000 lines long! The radar handling logic looks to cover both SAMs and air-to-air, and I think it could be more useful than the advanced AWG-9 stuff.</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">stealth</span><span class="p">.</span><span class="n">c</span><span class="o">:</span><span class="mi">495</span><span class="o">:</span>    <span class="k">if</span> <span class="p">(</span><span class="n">FTicks</span><span class="o">&lt;</span><span class="mi">4</span><span class="p">)</span> <span class="p">{</span>                 <span class="cm">/* Don't exceed 15 FPS */</span>
<span class="n">stealth</span><span class="p">.</span><span class="n">c</span><span class="o">:</span><span class="mi">499</span><span class="o">:</span>        <span class="n">TickDelay</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>              <span class="cm">/* Don't run any slower than 3 FPS */</span></code></pre></figure>

<p>I’m not sure if this means that the entire game is capped at 15 FPS, or is it just the stealth handling code running at that speed. Probably it’s the latter, but as I remember it, <em>Fleet Defender</em> isn’t very smooth even on fast machines.</p>

<p>Finally, some fun finds:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// THIS IS NOT MINE!!! I DID NOT DO THIS!!! I PLEDGE TO RIP THE HEART</span>
<span class="c1">// FROM THE INDIVIDUAL WHO DID!  THIS IS JUST A WORSE VERSION OF GOTO!</span>
<span class="c1">// I WILL CHANGE THIS AT SOME LATER DATE - DON'T MESS WITH IT, DON'T CHANGE</span>
<span class="c1">// IT!  THIS MEANS YOU!!! Thanks for your support - MJM</span>
<span class="k">static</span> <span class="kt">jmp_buf</span> <span class="n">resetmark</span><span class="p">;</span>

<span class="c1">// [...]</span>

<span class="c1">// THIS SUCKS!!! THIS WAS NOT MY IDEA - I REALIZE THAT THIS TAKES UP TO MUCH</span>
<span class="c1">// SPACE, BUT BLAM MIKE R.</span>
<span class="n">animtype</span> <span class="n">RioHead</span><span class="p">[</span><span class="mi">15</span><span class="p">]</span> <span class="o">=</span>
<span class="p">{</span> <span class="c1">//... </span>
<span class="p">}</span></code></pre></figure>

<p>The former comment applies to a global symbol used to execute <code class="language-plaintext highlighter-rouge">longjmp()</code> in <code class="language-plaintext highlighter-rouge">awg9.c</code>, the latter to some statically initialized arrays in <code class="language-plaintext highlighter-rouge">riohead.c</code>. Looks like developer frustration is a given on pretty much any software project. 😉</p>

<h2 id="great-now-what">Great, now what?</h2>

<p>Now, how do we go about figuring out if any of this code is actually present in <em>F15 II</em>, and if so, where? Fortunately, I <a href="/f15-se2/2025/01/29/newtooling2.html">recently developed</a> some tooling for extracting routine signatures from binaries and locating them in other binaries, which I orignally used to see if any work I did for <em>F-15 II</em>’s <code class="language-plaintext highlighter-rouge">START.EXE</code> would carry over to <code class="language-plaintext highlighter-rouge">EGAME.EXE</code>. This time, I will use it to search for bits of <code class="language-plaintext highlighter-rouge">F14.EXE</code> in <code class="language-plaintext highlighter-rouge">EGAME.EXE</code>. I am a bit lucky here. Despite <em>Fleet Defender</em>’s relatively late release date of 1994, the game does not appear to use protected mode, or much 32-bit code either (except for a few minor sections in assembly files, still in real mode). It’s kind of surprising given that by that was the year <em>Doom II</em> came out, but for better or worse, <em>Fleet Defender</em> seems to run 16bit real mode code. If it had been rewritten to use protected mode, the code would likely be mostly useless to me.</p>

<p>The first thing I need to do then is build the code. There is a DOS-era makefile included in the source tree, but I ended up making my own to have a bit more control over the build process and integrate well with my tooling for wrapping MS C. Incidentally, <em>Fleet Defender</em> seems to have been compiled by MSC 7.0, but I need to build it with MSC 5.1 to have the code matching <em>F-15 II</em> as closely as possible. For that, I had to do a couple tweaks to the code, mostly with large, statically initialized arrays defined within functions that MSC 5.1 did not like, but it was a simple matter of moving them up to global scope. I also had to exclude some C files which used inline assembly which is not supported in MSC 5.1, but there were only a few and they contained a minuscule amount of code.</p>

<p>Unfortunately, my tooling cannot (yet) parse object files in the OMF format, only EXEs, so I also need to link it into an executable. Here we come to an unfortunate realization: the code is incomplete. Most of the missing functions are declared in the header <code class="language-plaintext highlighter-rouge">library.h</code>:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="cm">/*
╔═══════════════════════════════════════════════════════════════════════╗
║********************   MPS Labs Graphic Library   *********************║
╟───────────────────────────────────────────────────────────────────────╢
║  File:  Library.h                                                     ║
║                                                                       ║
║  Auth:  David McKibbin                                                ║
║                                                                       ║
║  Edit:  dtm   July 20, 1992         1:18 pm                           ║
║                                                                       ║
║  Note:  HEADER definitions for MPSLIB?.LIB                            ║
║                                                                       ║
╟───────────────────────────────────────────────────────────────────────╢
║   Copyright (c) 1991 by MicroProse Software, All Rights Reserved.     ║
╚═══════════════════════════════════════════════════════════════════════╝
*/</span>
<span class="c1">// [...]</span>
<span class="cm">/***************    USER/RESIDENT "C" prototypes    ***************/</span>
<span class="k">extern</span>    <span class="kt">int</span>         <span class="nf">OpenFile</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">file</span><span class="p">,</span> <span class="kt">int</span> <span class="n">attrib</span><span class="p">);</span>    <span class="cm">/* fileio.c */</span>
<span class="k">extern</span>    <span class="kt">void</span>         <span class="nf">CloseFile</span> <span class="p">(</span><span class="kt">int</span> <span class="n">fh</span><span class="p">);</span>
<span class="k">extern</span>    <span class="kt">int</span>         <span class="nf">CreateFile</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">file</span><span class="p">);</span>
<span class="c1">// [...]</span>
<span class="cm">/***************    USER/RESIDENT "ASM" prototypes    ***************/</span>
<span class="k">extern</span>    <span class="kt">void</span>    <span class="n">far</span>  <span class="nf">InitSGF</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">palette</span><span class="p">);</span>        <span class="cm">/* lzwio.asm */</span>
<span class="k">extern</span>    <span class="kt">void</span>    <span class="n">far</span>  <span class="nf">InitSDF</span> <span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">extern</span>    <span class="kt">void</span>    <span class="n">far</span>  <span class="nf">ReadSDF</span> <span class="p">(</span><span class="kt">char</span> <span class="n">far</span> <span class="o">*</span><span class="n">buffer</span><span class="p">,</span> <span class="kt">int</span> <span class="n">count</span><span class="p">);</span>
<span class="c1">// [...]</span>
<span class="cm">/***************    Graphic Library prototypes    ***************/</span>
<span class="k">extern</span>    <span class="kt">void</span>    <span class="n">far</span>  <span class="nf">AddLine</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">RowBuff</span><span class="p">,</span> <span class="kt">int</span> <span class="n">page</span><span class="p">,</span> <span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">,</span> <span class="kt">int</span> <span class="n">count</span><span class="p">);</span>
<span class="k">extern</span>    <span class="n">UWORD</span>    <span class="n">far</span>  <span class="nf">AllocGraphicPage</span> <span class="p">(</span><span class="kt">int</span> <span class="n">page</span><span class="p">);</span>
<span class="k">extern</span>    <span class="n">UWORD</span>    <span class="n">far</span>  <span class="nf">AvailSysMem</span> <span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="c1">// [...]</span>
<span class="cm">/***************    MISC Library prototypes    ***************/</span>
<span class="k">extern</span>    <span class="kt">int</span>    <span class="n">far</span>  <span class="nf">IsKey</span> <span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">extern</span>    <span class="kt">int</span>    <span class="n">far</span>  <span class="nf">GetKey</span> <span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="k">extern</span>    <span class="kt">int</span>    <span class="n">far</span>  <span class="nf">EchoGetKey</span> <span class="p">(</span><span class="kt">void</span><span class="p">);</span>
<span class="c1">// [...]</span>
<span class="cm">/***************    SOUND Library prototypes    ***************/</span>
<span class="k">extern</span>    <span class="kt">int</span>    <span class="n">far</span>  <span class="nf">SndSysSetup</span><span class="p">();</span>
<span class="k">extern</span>    <span class="kt">int</span>    <span class="n">far</span>  <span class="nf">SndSounds</span><span class="p">();</span>
<span class="k">extern</span>    <span class="kt">void</span>    <span class="n">far</span>  <span class="nf">SndShutdown</span><span class="p">();</span></code></pre></figure>

<p>This looks quite familiar. I am pretty convinced that the “user/resident” declarations correspond to a bunch of common utility routines that I found while reconstructing F15SE2’s <code class="language-plaintext highlighter-rouge">START.EXE</code>, while the “Graphic/MISC/SOUND” prototypes are functions from the runtime-loaded <a href="/f15-se2/2023/07/12/overlays.html">overlay drivers</a>, calls to which I can see all over the place in F15SE2. This library header file is a boon in itself, as I still haven’t figured out the arguments and/or the purpose of many of those functions. Establishing the relationship between the declarations and the routines in the code will not be straightforward, since the header file seems to have the graphics routines (which are the most numerous and important) listed in alphabetical order, but I’m sure I can figure it out eventually.</p>

<p>It actually makes sense for the library code to be missing; it was probably part of a different source tree, shared between multiple game projects at MPS, and one that the <em>Fleet Defender</em> devs did not need to mess with (most of the time?). For my needs, I just stubbed all of them out and moved on. This still left me with a bunch of unresolved symbols for some routines and data from the linker. I don’t know what these are and what happened to them, but they are also missing from the source code, so it’s stub time again:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// data</span>
<span class="kt">char</span> <span class="n">far</span> <span class="o">*</span><span class="n">TILECOLORS</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">Fencer</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">ag_msg</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">far</span> <span class="o">*</span><span class="n">GREYBUF</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">MenuSpr1</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">MenuSpr2</span><span class="p">,</span> <span class="o">*</span><span class="n">MenuSpr3</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">GroundObjectScale</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">far</span> <span class="o">*</span><span class="n">GROUPCOLORS</span><span class="p">;</span>
<span class="kt">char</span> <span class="o">**</span><span class="n">TRANSGREYPTRS</span><span class="p">;</span>
<span class="kt">char</span> <span class="o">**</span><span class="n">GroundObjectNames</span><span class="p">;</span>
<span class="kt">long</span> <span class="n">DESIGNATED_X</span><span class="p">;</span>
<span class="kt">long</span> <span class="n">DESIGNATED_Y</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">SNDDETAIL</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">far</span> <span class="o">*</span><span class="n">STAMPCOLORS</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">ACQ_PX</span><span class="p">,</span> <span class="n">ACQ_PY</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">MSGDETAIL</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">far</span> <span class="o">***</span><span class="n">GROUPBUF</span><span class="p">;</span>
<span class="k">volatile</span> <span class="kt">int</span> <span class="n">MouseX</span><span class="p">;</span>
<span class="k">volatile</span> <span class="kt">int</span> <span class="n">MouseY</span><span class="p">;</span>
<span class="k">volatile</span> <span class="kt">int</span> <span class="n">Button</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">ag_msg_cnt</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">far</span> <span class="o">***</span><span class="n">STAMPBUF</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">MISSILETRAIL</span><span class="p">;</span>
<span class="kt">char</span> <span class="o">**</span><span class="n">HRM_RANGES</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">WORLDDETAIL</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">far</span> <span class="o">***</span><span class="n">TILEBUF</span><span class="p">;</span>
<span class="n">UWORD</span> <span class="n">far</span> <span class="o">*</span><span class="n">CrtMask</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">InFriendly</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">_acrtused</span><span class="p">;</span> <span class="c1">// workaround for unresolved symbol when linking without libc</span>

<span class="c1">// routines</span>
<span class="kt">long</span> <span class="nf">labs</span><span class="p">(</span><span class="kt">long</span> <span class="n">X</span><span class="p">)</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">FlyGroupLine</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">FlyTileLine</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">FlyGroupine</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">FlyStampLine</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">FlyLine</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">LockAG</span> <span class="p">()</span> <span class="p">{}</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">strupr</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">s</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span> <span class="p">}</span>
<span class="kt">int</span> <span class="n">isprint</span><span class="p">(</span><span class="kt">char</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span>
<span class="kt">void</span> <span class="n">SetJoysticks</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span><span class="p">)</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">ag_err_msgs</span> <span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">ChangeWeather</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">clip_rotmap</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">WorldName</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span> <span class="p">}</span>
<span class="kt">void</span> <span class="n">OverlaySequencePoints</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">S2MapLine</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">_fmemcpy</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">rotate_pt</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">SMapLine</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">_fstrcat</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">_fstrcpy</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">GetRidOfKeyJoy</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">Draw3dGroundObject</span><span class="p">(</span><span class="kt">int</span> <span class="n">NUM</span><span class="p">,</span> <span class="kt">int</span> <span class="n">X</span><span class="p">,</span> <span class="kt">int</span> <span class="n">Y</span><span class="p">,</span> <span class="kt">int</span> <span class="n">Z</span><span class="p">,</span> <span class="kt">int</span> <span class="n">BlowUpFlags</span><span class="p">)</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">Draw3dGroundObjects</span><span class="p">()</span> <span class="p">{}</span></code></pre></figure>

<p>That, along with the library function stubs got rid of the unresolved symbols. But the linker failed with a “fixup overflow” error in the data segment, meaning the size of the data exceeded 64k, and I’m building the code with the medium memory model (<code class="language-plaintext highlighter-rouge">/AM</code>), so there’s just one data segment. I was probably overly greedy in trying to cram as many of the source files into the exe as possible, and I could remove some, but I actually ended up using the <code class="language-plaintext highlighter-rouge">/NOD[EFAULTLIB]</code> linker option to avoid linking in the standard C library and its associated data, because I don’t actually need the resulting executable to be runnable. Doing that, however meant that I also had to stub out a bunch of libc routines to get it to link, but finally link it did, and I had my coveted <code class="language-plaintext highlighter-rouge">f14.exe</code>, built with MSC 5.1 with default flags.</p>

<p>The fact that the executable doesn’t run or have libc, poses a different problem. Because there’s no <code class="language-plaintext highlighter-rouge">START</code> from crt0, the entrypoint for the executable is fixed at <code class="language-plaintext highlighter-rouge">0:0</code>. I need to walk the code with my <code class="language-plaintext highlighter-rouge">mzmap</code> tool to figure out where the routines begin and end, so I can extract their signatures for comparison with F15SE2, but if the main entrypoint is broken and control doesn’t really go anywhere, the walker will fail.</p>

<p>However, I <strong>do</strong> know where the routines are located, because the linker output a <code class="language-plaintext highlighter-rouge">.MAP</code> file when it created <code class="language-plaintext highlighter-rouge">f14.exe</code>, and the map file lists all public routines along with their entrypoint addresses. I implemented a new option to <code class="language-plaintext highlighter-rouge">mzmap</code>; <code class="language-plaintext highlighter-rouge">--linkmap</code> lets it ingest the linker map file as a set of “hints” for routine entrypoints, segment locations and data references as well. That adds a useful feature to <code class="language-plaintext highlighter-rouge">mzmap</code>, and while walking <code class="language-plaintext highlighter-rouge">f14.exe</code>, I found and fixed a bunch of pretty serious bugs in the walker, so I’m pretty happy that I took the effort to do so. At the end, I had a map file compatible with my tooling, with most of the <code class="language-plaintext highlighter-rouge">f14.exe</code> routine boundaries discovered.</p>

<p>Okay, now with the F-14 executable built and the map generated, time to extract the signatures. Or is it? My <code class="language-plaintext highlighter-rouge">mzdup</code> tool scans for duplicate routines given two exes and their maps. But I want it to be more flexible, so I ended splitting up the job, implementing <code class="language-plaintext highlighter-rouge">mzsig</code>. Given an exe and its map, plus some options to say what size of routines it should ignore, it will extract the signatures and save them to a file. This also lets me combine signatures from more than one executable in one signature file, which is pretty handy – remember that some F15SE2 routines have been built <a href="/f15-se2/2023/09/02/compiler3.html">with debug flags enabled</a>, so I need to build and extract signatures for at least two versions of <code class="language-plaintext highlighter-rouge">f14.exe</code>, one with <code class="language-plaintext highlighter-rouge">/Zi</code> and one without it.</p>

<p>The new version of <code class="language-plaintext highlighter-rouge">mzdup</code> ingests the signature file along with a exe/mapfile pair to look for matches in. I also changed the edit distance cutoff threshold from a hard value shared between all routines to a configurable ratio of the routine size, meaning bigger routines can have more differing instructions than smaller ones and still be considered matches.</p>

<h2 id="give-us-the-damn-results-already">Give us the damn results already</h2>

<p>Well, okay, since you ask so nicely:</p>

<pre>
ninja@thinkpad:f15se2-re$ mzdup --verbose --minsize 5 --maxdist 20 ../f14/f14-ot.sig ../ida/egame.exe map/egame.map
[...]
Unable to find duplicate of _NCloudLine, 61 instructions, max distance 12
Unable to find duplicate of _NCloudLine2, 68 instructions, max distance 13
Unable to find duplicate of _NCloudLine3, 56 instructions, max distance 11
Unable to find duplicate of routine_1035, 60 instructions, max distance 12
Unable to find duplicate of _MakeBspDrawList, 16 instructions, max distance 3
Found duplicate of routine _TrgMul (9 instructions): routine_161/1000:3b2f/013b2f (9 instructions) with distance 1
Unable to find duplicate of _DPTRGMUL, 6 instructions, max distance 1
Unable to find duplicate of _MUL256DIV, 24 instructions, max distance 4
Found duplicate of routine _Icos (5 instructions): routine_158/1000:3b96/013b96 (5 instructions) with distance 1
Found duplicate of routine _Isin (5 instructions): routine_158/1000:3b96/013b96 (5 instructions) with distance 1
WARNING: Routine routine_158/1000:3b96/013b96 is a duplicate of _Isin and _Icos with equal distance
Unable to find duplicate of routine_1037, 9 instructions, max distance 1
Unable to find duplicate of _TransScaleLine, 58 instructions, max distance 11
Unable to find duplicate of _TacTransScaleLine, 31 instructions, max distance 6
Processed 944 signatures, ignored 0 as too short
Tried to find matches for 400 target exe routines (22524 instructions, 100%)
Found 22 (unique: 20) matching routines (743 instructions, <r>3%</r>)
Unable to find 378 matching routines (97%)
WARNING: Some routines were found as duplicates of more than one routine. This is possible, but unlikely. 
Try using a longer minimum routine size and/or lower distance threshold to avoid false positives.
Saving code map (routines = 400) to map/egame.map.dup, reversing relocation by 0x1000
</pre>

<p>It was able to find matches for 20 unique routines from <code class="language-plaintext highlighter-rouge">f14.exe</code> built with default flags, which constitutes a meager 3% of F15SE2’s <code class="language-plaintext highlighter-rouge">egame.exe</code>. I tried with a version built with <code class="language-plaintext highlighter-rouge">/Zi</code> and it was similar. Increasing the maximum acceptable edit distance ratio does not seem to help much.</p>

<h2 id="great-so-all-that-for-nothing">Great, so all that for nothing?</h2>

<p>Depends on how you look at it. For me, it’s really all about the journey, and I could not have moved on with the reconstruction without settling this issue. Also, I’m still convinced the <em>Fleet Defender</em> codebase will come in useful, and getting familiar with it gave me a new source of insight and inspiration.</p>

<p>I also fixed a bunch of serious bugs in my tooling, further enabling it to work with more than just F15SE2, and implemented useful features which will become useful again when we decide to branch out to F-19 and F-117 in the future. All of this stuff is available in <a href="https://github.com/neuviemeporte/mzretools">mzretools v1.0.0</a>.</p>

<p>Additionally, I am still not convinced my signature search is a 100% reliable. For one, I’m reusing an enum for the instruction class (mov/sub/jump/…) which is an abstract representation of the opcode, and it retains some differences between instructions that I really should make more vague, like the fact that it uselessly distinguishes between call/jmp and their far equivalents. For memory operands, it remembers if an instruction used an 8-, or a 16bit offset for a data reference, which can also introduce differences into otherwise equivalent code. There could also be some serious bugs that make it miss matches for some other reason, who knows. Something <strong>must</strong> be working because when I search for signatures from F-19’s <code class="language-plaintext highlighter-rouge">EGAME.EXE</code> in F15SE2, it’s able to match 38% of the code with a threshold of 10% difference. When I increase the threshold to 30%, the matched code jumps up to 53%, but what I’m saying is that there still could be breakthroughs made on the tooling which would let it find more matches for <code class="language-plaintext highlighter-rouge">f14.exe</code>.</p>

<p>Finally, the leaked code itself introduces problems. Parts of it are disabled with <code class="language-plaintext highlighter-rouge">#ifdef YEP/#endif</code> - WTF??? Parts are commented out, with no way of telling whether they have been that way forever, or just disabled somewhere along the way. More work needs to be done to clean it up, generate more signatures and try again with different variants of the build.</p>

<p>With that said and done, I really should get to reconstructing <code class="language-plaintext highlighter-rouge">egame.exe</code>. I won’t make any promises this time, but the day is getting close…</p>]]></content><author><name></name></author><category term="f15-se2" /><summary type="html"><![CDATA[(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)]]></summary></entry><entry><title type="html">New tools prevent repeat work, provide better game data insight</title><link href="/f15-se2/2025/01/29/newtooling2.html" rel="alternate" type="text/html" title="New tools prevent repeat work, provide better game data insight" /><published>2025-01-29T00:00:00+00:00</published><updated>2025-01-29T00:00:00+00:00</updated><id>/f15-se2/2025/01/29/newtooling2</id><content type="html" xml:base="/f15-se2/2025/01/29/newtooling2.html"><![CDATA[<p><small>(<em>This post is part of a <a href="/category/f15-se2.html">series</a> on the subject of my hobby project, which is recreating the C source code for the 1989 game <a href="/f15-se2/2022/06/05/origins.html">F-15 Strike Eagle II</a> by reverse engineering the original binaries.</em>)</small></p>

<p>I guess you could say I am procrastinating with moving onto the next executable to reconstruct after my recent success with making <code class="language-plaintext highlighter-rouge">START.EXE</code> runnable, but I had a couple ideas for the tooling which I wanted to check out first. So here’s a brief summary of what’s available in <a href="https://github.com/neuviemeporte/mzretools">mzretools 0.9.7</a>.</p>

<p>I already <a href="/f15-se2/2024/12/31/newtooling.html">fixed</a> a bunch of bugs that would likely limit my tools’ robustness when working on <code class="language-plaintext highlighter-rouge">EGAME.EXE</code>, and together with these new features, I think I’m in a good place to take on the new work. Also, changing gears once in a while to work on algorithmic stuff in a modern codebase helps me keep from burning out on the ancient assembly opcodes.</p>

<h1 id="mzdup">mzdup</h1>

<p>With the way the game is structured with the 3 main executables for the main stages of the game, I’m pretty sure there must be at least <em>some</em> code duplication between them. They are avoiding most of it by putting the common graphics/sound/input routines into <a href="/f15-se2/2023/07/12/overlays.html">overlays</a> and dynamically loading in the code at runtime, but these are mostly low-level operations for things like drawing a sprite or a string at a particular screen location. Meanwhile, the code for <a href="https://github.com/neuviemeporte/f15se2-re/blob/main/src/start4.asm#L1534">loading the sprite data</a> and <a href="https://github.com/neuviemeporte/f15se2-re/blob/main/src/start4.asm#L938">setting up the overlays</a> themselves is present out in the open in <code class="language-plaintext highlighter-rouge">START.EXE</code>, and I’m pretty sure <code class="language-plaintext highlighter-rouge">EGAME.EXE</code> will need to do pretty much the same thing. The next executable is going to be challenging due to its larger size and the kind of thing it does (3D rendering and projection math), so I’m going to milk the work that I already did for the first one as much as I can.</p>

<p>The task then is to find duplicate routines from <code class="language-plaintext highlighter-rouge">START</code> in <code class="language-plaintext highlighter-rouge">EGAME</code>. Obviously, I cannot just search for the binary data because the offsets encoded in the 8086 assembly instructions are going to be different, and I won’t find anything longer that a couple instructions. Also, there is no guarantee that the routines even contain the exact same instructions - the code might have been tweaked between linking one executable vs the other. They might even have used a different assembler which used a different encoding for the instructions, who knows?</p>

<p>I am aware that this is a solved problem because of the existence of <a href="https://docs.hex-rays.com/user-guide/signatures/flirt">FLIRT</a> in IDA, but for one, generating FLIRT signatures from binaries requires the paid version of IDA which I don’t have, and it would require me to go into a Win XP VM which I use to run IDA. I wanted the experience to be more streamlined. Also, it seemed like an interesting challenge which I figured I could pull off, which is as good excuse as any in my book. 😉</p>

<p>The first problem is similar to what I encountered when initially implementing <code class="language-plaintext highlighter-rouge">mzdiff</code>; I needed to compare instructions between two divergent executables that I wanted to make identical, so I have some facilities implemented to strip out the offsets from parsed instructions. The instructions are also represented in an abstract way in my code, each belonging to a “class” that is more or less equivalent to an assembly mnemonic (<code class="language-plaintext highlighter-rouge">mov</code>/<code class="language-plaintext highlighter-rouge">cmp</code>/<code class="language-plaintext highlighter-rouge">call</code>). So it wasn’t difficult to format the instructions of a routine into a string of abstract “signatures” which are just the optional segment override prefix, the class, and the operand types (mem/reg/immediate) <a href="https://github.com/neuviemeporte/mzretools/commit/cf92f11137f384afdd998f65460d850fa57b0f21#diff-679ea22dfd0ffd35b16186f70f94d92c84f8db9fc28ecca2bb51573071944fceR658">fused together</a> to form a 32bit “character” of the string I will be searching for.</p>

<p>The second problem is kind of brainy. Basically, I need to calculate the <a href="https://en.wikipedia.org/wiki/Edit_distance">edit distance</a> (from now on, “ED”) between the two strings formed by the signatures generated from the instructions of two routines to tell me how many instructions need to be modified to make the routines identical. However, I was never good at implementing fancy algorithms from descriptions in research papers, and dynamic programming is always giving me the willies. So I ended up <a href="https://github.com/roy-ht/editdistance">“borrowing”</a> the implementation from somebody else. Thanks!</p>

<p>Now it’s just a matter of brute forcing the solution, trying to match every routine from my “known” executable against every potential entrypoint location in the “unknown” one. I have some simple heuristics to avoid calculating the ED between routines which differ too much by the instruction count as to immediately tell that the ED will not fill below the maximum difference threshold. Also, the ED routine will interrupt the calculation as soon as it determines that the threshold will not be satisfied. That’s about as much as I can bother with in terms of optimization. I implemented a new tool called <code class="language-plaintext highlighter-rouge">mzdup</code> as a thin frontend for calling the appropriate analysis routine, so let’s give it a spin:</p>

<pre>
🔵 walk and discover routines and data in egame.exe first, I know main() is at 0x10 from looking in IDA
ninja@RYZEN:f15se2-re$ mzmap ../ida/egame.exe:0x10 map/egame.map # 
Loading executable ../ida/egame.exe at segment 0x1000
Analyzing code within extents: 1000:0000-3000:8f6f/028f70
Done analyzing code, examined 6013 locations
DEBUG: Dumping visited map of size 0x28f70 starting at 0x10000 to routines.visited
Building routine map from search queue contents: 393 routines over 5 segments
🔴 need to take a look at this...
ERROR: Unable to find a segment for routine map offset 0x32740, ignoring remainder 
Saving routine map (routines = 393) to map/egame.map, reversing relocation by 0x1000
Please review the output file (map/egame.map), assign names to routines/segments
You may need to resolve inaccuracies with routine block ranges manually; this tool is not perfect
🔵 search for duplicates of routines from start.exe in egam.exe
ninja@RYZEN:f15se2-re$ mzdup ../ida/start.exe map/start.map ../ida/egame.exe map/egame.map
🔴 and again
ERROR: Unable to find a segment for routine map offset 0x32740, ignoring remainder
Searching for duplicates of 255 routines among 393 candidates, minimum instructions: 15, maximum distance: 1
Found duplicates for 39 (unique 39) routines out of 255 routines, ignored 139
Saving routine map (routines = 393) to map/egame.map.dup, reversing relocation by 0x1000
🔵 initial look at results, the tool adds the 'duplicate' annotation to the routine along with an informative comment before it
ninja@RYZEN:f15se2-re$ cat map/egame.map.dup | grep duplicate
# Routine routine_264 is a potential duplicate of routine <b>sub_154A1</b>, block 1000:1d6e-1000:1e0d/0000a0 differs by 0 instructions
routine_264: Code1 NEAR 1d6e-1e0d R1d6e-1e0d duplicate
# Routine routine_3 is a potential duplicate of routine <b>installCBreakHandler</b>, block 1000:3bec-1000:3c0e/000023 differs by 0 instructions
routine_3: Code1 NEAR 3bec-3c0e R3bec-3c0e duplicate
# Routine routine_34 is a potential duplicate of routine <b>setTimerIrqHandler</b>, block 1000:3c78-1000:3cb5/00003e differs by 0 instructions
routine_34: Code1 NEAR 3c78-3cb5 R3c78-3cb5 duplicate
# Routine routine_63 is a potential duplicate of routine <b>sub_119D4</b>, block 1000:3df2-1000:3e59/000068 differs by 0 instructions
routine_63: Code1 NEAR 3df2-3e86 R3df2-3e59 U3e5a-3e5a R3e5b-3e86 duplicate
# Routine routine_94 is a potential duplicate of routine <b>sub_11A69</b>, block 1000:3e87-1000:3eb0/00002a differs by 0 instructions
routine_94: Code1 NEAR 3e87-3edb R3e87-3eb0 U3eb1-3eb1 R3eb2-3edb duplicate
# Routine routine_85 is a potential duplicate of routine <b>openFile</b>, block 1000:ddc4-1000:de1a/000057 differs by 0 instructions
routine_85: Code1 NEAR ddc4-de1a Rddc4-de1a Rdf80-dfbb duplicate
# Routine routine_93 is a potential duplicate of routine <b>fileClose</b>, block 1000:de72-1000:de92/000021 differs by 0 instructions
routine_93: Code1 NEAR de72-de92 Rde72-de92 duplicate
# Routine routine_61 is a potential duplicate of routine <b>showPicFile</b>, block 1000:e0aa-1000:e11b/000072 differs by 0 instructions
routine_61: Code1 NEAR e0aa-e11b Re0aa-e11b duplicate
# Routine routine_90 is a potential duplicate of routine <b>decodePicRow</b>, block 1000:e262-1000:e28b/00002a differs by 0 instructions
routine_90: Code1 NEAR e262-e28b Re262-e28b duplicate
# Routine routine_111 is a potential duplicate of routine <b>picReadDataAndMakeDict</b>, block 1000:e28c-1000:e2d2/000047 differs by 0 instructions
routine_111: Code1 NEAR e28c-e2d2 Re28c-e2d2 duplicate
# Routine routine_280 is a potential duplicate of routine <b>picMakeDict</b>, block 1000:e2d3-1000:e308/000036 differs by 0 instructions
routine_280: Code1 NEAR e2d3-e308 Re2d3-e308 duplicate
# Routine routine_192 is a potential duplicate of routine <b>dictionaryLookup</b>, block 1000:e382-1000:e430/0000af differs by 0 instructions
[...]
</pre>

<p>The default maximum ED of one is pretty strict, but I found that when using a higher threshold, like 5, especially with a lower minimum routine length threshold (10), I was getting a bunch of false positives from tiny functions. Fiddling with these values might yield more results, but this is just a demonstration of the concept.</p>

<p>Out of the 39 duplicates found, most are libc functions. But it did find the Ctrl-Break handler, the timer interrupt handler, a bunch of the <code class="language-plaintext highlighter-rouge">.PIC</code> graphical format-related decoding functions as expected. Also, some routines whose purpose I don’t even know yet from <code class="language-plaintext highlighter-rouge">START</code> have been identified (<code class="language-plaintext highlighter-rouge">sub_...</code>) in <code class="language-plaintext highlighter-rouge">EGAME</code>. I will merge the <code class="language-plaintext highlighter-rouge">egame.map.dup</code> file with the <code class="language-plaintext highlighter-rouge">egame.map</code> file (or replace the latter outright after confirming everything else looks fine) and mark the relevant routines in my IDA project. It’s not a whole lot, but it’s something.</p>

<p>The error visible in the mapping stage happens because the code walker was unable to discover the data segment, and after it’s done walking all the code paths and starts calculating which regions of the executable’s load module belong to which routines, it reaches the boundaries of the current code segment (remember, segments are max 64kB under DOS), and doesn’t have a segment where it can put that location, so it ignores the rest of the address space. It does not influence the routine discovery much, because we’re in the data segment already and there’s no more code past that, but it is something that will need addressing because it means no variables from the data segment will be discovered. For now, the tooling has very limited capabilities of discovering segments, basically it can only recognize a data or stack segment when seeing a <code class="language-plaintext highlighter-rouge">mov</code> to the <code class="language-plaintext highlighter-rouge">DS</code> or <code class="language-plaintext highlighter-rouge">SS</code> registers from either an immediate, or another register with a known value. It worked fine for <code class="language-plaintext highlighter-rouge">START</code> but in this case, establishing the data segment happened in CRT0 code with <code class="language-plaintext highlighter-rouge">mov di, 0x1234 - mov ss, di - push ss - pop ds</code>, and I don’t trace pushes or pops. At some point, I’m going to have to implement actual 8086 instruction execution, but that’s a lot of work that I don’t want to get into right now. Or I could let the user specify the segments manually, but where would be the fun in that? 😉</p>

<h1 id="mzptr">mzptr</h1>

<p>While fixing the <a href="/f15-se2/2025/01/09/start-runs.html">multitude of bugs</a> that prevented <code class="language-plaintext highlighter-rouge">START</code> from working, some of them predictably turned out to be variables which I thought were straight numeric values, that actually ended up being pointers to different variables. Since the layout of the reconstruction differs from the original, the hard-baked pointers don’t match after rebuilding, and stuff breaks. I found and resolved a bunch of them, but that gave me an idea of trying to do it semi-automatically. Essentially, if I knew where the variables were, I could just brute force search the raw contents of the data segment for numbers which match the offsets of known variables. Sure, I’m bound to get some false positives, but maybe also catch some pointers that flew under the radar?</p>

<p>The biggest challenge here was actually “knew where the variables were”. I am identifying memory operands while comparing with <code class="language-plaintext highlighter-rouge">mzdiff</code>, but <code class="language-plaintext highlighter-rouge">mzmap</code> did not take note of data while walking the executable, so it was a fair bit of development effort as well as refactoring (<code class="language-plaintext highlighter-rouge">RoutineMap</code> became <code class="language-plaintext highlighter-rouge">CodeMap</code> as it doesn’t only track routines anymore) to get it done. Now, the map file that <code class="language-plaintext highlighter-rouge">mzmap</code> spits out will also contain found memory operand offsets as potential variable locations. Next, it was a matter of implementing the search. I’m scanning the data segment(s) byte by byte, extracting a 16bit value at every location by swapping the little-endian bytes, and comparing it with all the offsets of all the known variables - nothing subtle. In the future, the search could be extended to the code segment for any lingering pointers that could have been put there with assembly.</p>

<p>Meanwhile, the new utility <code class="language-plaintext highlighter-rouge">mzptr</code> serves as a frontend for this capability. The useful feature it has is that it will sort the found references by the found count, with the idea being that variables with fewer matches are more likely to be genuine pointers. That’s because a variable located at an offset which is a small or non-distinct value like <code class="language-plaintext highlighter-rouge">0x0800</code> is very likely to spawn multiple false positives at many locations, whereas a value like <code class="language-plaintext highlighter-rouge">0x7fc3</code> if more likely to be represented once, or not at all. Additionally, within the same match count, variables are sorted by the offset at which they were found, which lets me spot sequential arrays of pointers. So again, let’s try it out:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ninja@RYZEN:f15se2-re$ mzptr ../ida/start.exe map/start.map
Search complete, found 528 potential references, unique: 132
Printing reference counts per variable, counts higher than 1 or 2 are probably false positives due to a low/non-characteristic offset
word_16BE2/16b5:0092/016be2: 1 reference @ Data1:0xa8
page1Num/16b5:0530/017080: 1 reference @ Data1:0x546
page2Num/16b5:0548/017098: 1 reference @ Data1:0x55e
unk_170B0/16b5:0560/0170b0: 1 reference @ Data1:0x576
🔵 an array of string pointers here
aLibya/16b5:00c2/016c12: 1 reference @ Data1:0x578
aVietnam/16b5:00d5/016c25: 1 reference @ Data1:0x57c
aMiddleEast/16b5:00dd/016c2d: 1 reference @ Data1:0x57e
aOtherAreas/16b5:00e9/016c39: 1 reference @ Data1:0x580
🔵 same here
aAcrossTheLineO/16b5:00f5/016c45: 1 reference @ Data1:0x582
aKeepingTheSeaL/16b5:0110/016c60: 1 reference @ Data1:0x584
aAmericaSLonges/16b5:012b/016c7b: 1 reference @ Data1:0x586
aEaglesVsMigs/16b5:0145/016c95: 1 reference @ Data1:0x588
aInsertYourScen/16b5:0154/016ca4: 1 reference @ Data1:0x58a
🔵 and here
aRookie/16b5:016e/016cbe: 1 reference @ Data1:0x58c
aPilot/16b5:0175/016cc5: 1 reference @ Data1:0x58e
aVeteran/16b5:017b/016ccb: 1 reference @ Data1:0x590
aAce/16b5:0183/016cd3: 1 reference @ Data1:0x592
aDemo/16b5:0187/016cd7: 1 reference @ Data1:0x594
aGetOffToAGoodS/16b5:018c/016cdc: 1 reference @ Data1:0x596
aForTheCasualPl/16b5:01a4/016cf4: 1 reference @ Data1:0x598
aForMoreSerious/16b5:01ba/016d0a: 1 reference @ Data1:0x59a
aTheUltimateCha/16b5:01d3/016d23: 1 reference @ Data1:0x59c
aLetSSeeWhatThi/16b5:01ea/016d3a: 1 reference @ Data1:0x59e
aNc/16b5:020b/016d5b: 1 reference @ Data1:0x5a0
aCe/16b5:020e/016d5e: 1 reference @ Data1:0x5a2
aJp/16b5:0211/016d61: 1 reference @ Data1:0x5a4
aNa/16b5:0214/016d64: 1 reference @ Data1:0x5a6
aNorthCape/16b5:0217/016d67: 1 reference @ Data1:0x5a8
aCentralEurope/16b5:0222/016d72: 1 reference @ Data1:0x5aa
aDesertStorm/16b5:0231/016d81: 1 reference @ Data1:0x5ac
[...]
🔴 these are all bogus; small, non-distinctive offset values
aPersianGulf/16b5:00c8/016c18: 8 references
crt0_end/16b5:0041/016b91: 8 references
fileHandle/16b5:4600/01b150: 15 references
aMsRunTimeLibra/16b5:0008/016b58: 20 references
unk_16B56/16b5:0006/016b56: 34 references
aOnc_2/16b5:0700/017250: 39 references
unk_16B57/16b5:0007/016b57: 45 references
byte_16B54/16b5:0004/016b54: 87 references
crt0_16B52/16b5:0002/016b52: 118 references
</code></pre></div></div>

<p>It found quite a lot of potential locations, but only 132 unique references, with the bulk of the overall 528 count being false positives for variables with non-distinct offsets, like <code class="language-plaintext highlighter-rouge">crt0_16B52</code> located at offset <code class="language-plaintext highlighter-rouge">ds:0002</code> (and there are a lot of 2’s in the data segment). But the single-instance references at the top of the listing are actually genuine, so I checked every single location in the data segment to make sure they weren’t hardcoded to the numeric value of the offset, which is a bit tedious, but still infinitely better than trying to figure it out when debugging. I didn’t actually find any missed references, just one case of the opposite, where a numeric value was replaced with an offset to a variable that should not have been.</p>

<p>Another neat thing to see is how the references form arrays of pointers at some location, like the names of the scenarios (Libya, Vietnam, …) starting at <code class="language-plaintext highlighter-rouge">Data1:0578</code> or the difficulty levels (Rookie, Pilot, …) at <code class="language-plaintext highlighter-rouge">Data1:058c</code>.</p>

<h1 id="conclusion">Conclusion</h1>

<p>Admittedly, the results are not groundbreaking, but this was something that I just needed to check out of curiosity. I was unable to fix the remaining bugs in <code class="language-plaintext highlighter-rouge">START</code> this way either, so it means it’s up for a new round of debugging in the near future. Anyway, I think this work will pay dividends in the future because:</p>

<ol>
  <li>I expect <code class="language-plaintext highlighter-rouge">END.EXE</code> (the debriefing stage which mostly just shows static images) to be more similar to <code class="language-plaintext highlighter-rouge">START</code> than <code class="language-plaintext highlighter-rouge">EGAME</code> is, so I expect to find a fair amount of duplication there,</li>
  <li>I will need to find pointers in both <code class="language-plaintext highlighter-rouge">EGAME</code> and <code class="language-plaintext highlighter-rouge">END</code>, so <code class="language-plaintext highlighter-rouge">mzptr</code> will see its share of work (once I can get the data segment discovery to work),</li>
  <li>The tools project is not only about F-15 SE2 and this could come useful to somebody working on a different project,</li>
  <li>The other Microprose flight games on this engine (F-19/F-117) are likely going to contain some duplication, so being able to find matching routines is going to go a long way towards supporting them some day.</li>
</ol>]]></content><author><name></name></author><category term="f15-se2" /><summary type="html"><![CDATA[(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)]]></summary></entry><entry><title type="html">The first reconstructed executable is playable!</title><link href="/f15-se2/2025/01/09/start-runs.html" rel="alternate" type="text/html" title="The first reconstructed executable is playable!" /><published>2025-01-09T00:00:00+00:00</published><updated>2025-01-09T00:00:00+00:00</updated><id>/f15-se2/2025/01/09/start-runs</id><content type="html" xml:base="/f15-se2/2025/01/09/start-runs.html"><![CDATA[<p><small>(<em>This post is part of a <a href="/category/f15-se2.html">series</a> on the subject of my hobby project, which is recreating the C source code for the 1989 game <a href="/f15-se2/2022/06/05/origins.html">F-15 Strike Eagle II</a> by reverse engineering the original binaries.</em>)</small></p>

<p>This is just a short update to share a significant milestone: the first reconstructed executable (<code class="language-plaintext highlighter-rouge">START.EXE</code>) is now playable in the original game.</p>

<p>I “finished” the reconstruction a few months ago, meaning all of the code that was generated from C source has been transcribed back into identical C. I left the assembly routines as is (meaning as generated from the IDA liisting), except for some variables having meaningful names, and some comments, both carried over from research done in IDA. Also, the contents of the data segment have been (and still are) generated from assembly. I am not sure how the reconstruction will behave after these are moved back to C, but probably there will be some fallout in the form of bugs to fix. I have strived to replace all hard-baked offsets with references to symbols, but still something might have slipped through the cracks.</p>

<p>In any case, I was pretty surprised that the reconstruction did not run given that the code was “identical”, as attested by my <code class="language-plaintext highlighter-rouge">mzdiff</code> tool. But the thing is, the tool cannot tell for instructions like <code class="language-plaintext highlighter-rouge">mov ax, 0x1234</code> if the immediate value is some computational constant, or an offset of a variable, so it does not consider immediate value differences as straight-up mismatches, period. This has the potential to backfire badly, and after I had fixed the big problem with the incorrect value being set for the data segment <a href="/f15-se2/2025/01/01/unstart2.html">in the previous post</a>, most problems turned out to have been caused by typos where I put an incorrect immediate value, and <code class="language-plaintext highlighter-rouge">mzdiff</code> ignored the difference. These were usually small numbers, so I actually put in a silly heuristic into the latest <code class="language-plaintext highlighter-rouge">mzdiff</code> to highlight instructions differing on immediate values in bright red as a warning if the value of the immediate is less than <code class="language-plaintext highlighter-rouge">0xff</code>. It actually came pretty handy and I was able to find a bunch more that needed resolving.</p>

<p>In short, I managed to solve a bunch of problems, some were caused by bugs in <code class="language-plaintext highlighter-rouge">mzdiff</code> which didn’t catch some edge cases of differing instructions (these were also fixed in the tooling repo), but most were immediate value mismatches, or remaining numerical values of data segment variables which were supposed to be actually pointers to other variables.</p>

<p>Some of the issues I resolved:</p>

<ul>
  <li>crashing upon entering the pilot select screen - caused by a wrong <code class="language-plaintext highlighter-rouge">do..while</code> loop</li>
  <li>memory corruption from the routine to clear the screen overwriting the game code instead of video memory</li>
  <li>keyboard input other than enter not working on the pilot select screen</li>
  <li>wrong location of blinking cursor on pilot select screen</li>
  <li>mission generation routine freezing, stuck in an infinite loop - caused by an invalid read size from a terrain (<code class="language-plaintext highlighter-rouge">.3dt</code>) file</li>
  <li>crashing after displaying the generated mission, just before termination and entering the flight engine, again caused by the clear screen routine</li>
</ul>

<table class="imgcaption">
  <thead>
    <tr>
      <th style="text-align: center"><img src="/images/start_glitch.webp" alt="a glitch on the roster screen" class="center-image" /></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center">One of the glitches I ran into on the pilot select screen</td>
    </tr>
  </tbody>
</table>

<p>After all that, the reconstructed executable let me pick a pilot and scenario, the mission was generated and displayed, the executable terminated cleanly and the loader executed the mission in the flight engine. Which is pretty great.</p>

<p>However, I can still see some problems when in flight:</p>

<ul>
  <li>the map MFD in the cockpit looks somewhat broken, has strange colors</li>
  <li>some HUD symbology has incorrect colors also</li>
  <li>the missile count is 1/4/0, which is pretty strange</li>
  <li>exiting the flight engine with Alt-Q displays a libc error message about a null pointer assignment</li>
</ul>

<p>All of these can be fixed later, but in addition, there is still work to be done on the reconstruction front:</p>

<ul>
  <li>all of the routines which were originally written in assembly need porting over to C</li>
  <li>the data segment is still generated from assembly and needs porting to C</li>
  <li>the code still contains placeholder routine and variable names for places where the intent of the code is not understood, so it needs more research and experimentation.</li>
</ul>

<p>The research part should be much easier now, since the code can be instrumented, and I already have a bunch of trace logs implemented and working both from C and assembly – use the <code class="language-plaintext highlighter-rouge">make debug</code> target to build a version with traces enabled, these are written to <code class="language-plaintext highlighter-rouge">f15.log</code> in the game directory. Also, there’s probably a bunch of research that can be carried over from <a href="https://github.com/debugcom/Hacking-F117A">debugcom’s findings</a> on the mission generator in F19/F117, and the <a href="https://github.com/alekasm/f14">source code leak for F14</a> could provide insights also.</p>

<p>I will be going back to <code class="language-plaintext highlighter-rouge">START.EXE</code> at some point, but for now I am eager to jump into the next executable, <code class="language-plaintext highlighter-rouge">EGAME.EXE</code> which is really the meat of the game. However, since it’s bound to contain a bunch of duplicate routines with <code class="language-plaintext highlighter-rouge">START.EXE</code>, I’m going to be switching gears for a while and going back to the tooling, where I plan to implement a new tool for identifying similar routines in different executables, in the hope it will save me some time, especially when I get to the final <code class="language-plaintext highlighter-rouge">END.EXE</code> which I expect to mostly consist of routines shared with <code class="language-plaintext highlighter-rouge">START.EXE</code>, since all it does is show some backgrounds, sprites and text.</p>

<p>As always, I will share updates of new developments once something significant happens.</p>]]></content><author><name></name></author><category term="f15-se2" /><summary type="html"><![CDATA[(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)]]></summary></entry><entry><title type="html">The thing won’t START, Part 2</title><link href="/f15-se2/2025/01/01/unstart2.html" rel="alternate" type="text/html" title="The thing won’t START, Part 2" /><published>2025-01-01T00:00:00+00:00</published><updated>2025-01-01T00:00:00+00:00</updated><id>/f15-se2/2025/01/01/unstart2</id><content type="html" xml:base="/f15-se2/2025/01/01/unstart2.html"><![CDATA[<p>(…continued from  <a href="/f15-se2/2024/12/31/unstart.html">Part 1</a>)</p>

<p>In some ways, I think keeping this journal was one of the better decisions on this project. Never was one for talking to a rubber duck, but somehow putting thoughts to words gives me a better view of the overall picture, and keeping a record of all the stuff I did doesn’t hurt either; I already found myself going back to old posts to remind myself of how or why I used to do something in the past. But the best part has to be that writing brings out new ideas whenever I’m stuck. So it was this time. No sooner than ink had dried on my previous post, I already had a bunch of new ideas on how to proceed, and by New Year’s Day I had the recreation running.</p>

<p>There have been a bunch of pretty surprising coincidences involved in this bug that made it act wacky, so read below on how I managed to track it down and resolve it.</p>

<h1 id="introspection">Introspection</h1>

<p>The problem I was having with the freezing executable is that the log output was confusing. Sometimes it would appear to be trimmed. Sometimes there was more of it, sometimes just a little. The code seemed to be working slowly, but delta measurements showed little to no delay between the lines. In essence, trying to look at the game’s behaviour through the logs just caused more confusion, leading me to question my logging framework.</p>

<p>Then I realized that the confusion could be eliminated if I wasn’t looking at the output through the logfile after the fact, but directly on the terminal as it was happening. Then I could visually see how long stuff was taking. Of course, the game switches to graphical mode which screws up the text display, but I have the source code now, so I can just comment it out temporarily. So that’s what I ended up doing.</p>

<p><img src="/images/start-row0.png" alt="logs on row 0" class="center-image" />
<img src="/images/start-row2.png" alt="logs on row 2" class="center-image" /></p>

<p>Seeing the game that was a black box up until recently now spill its guts all over the console is pretty neat, I think. In any case, as seen from the first screen, the game pauses immediately after entering the decoding loop for row zero, and it appears to be waiting for input! When I pressed the Enter key, you can see it progressed up to row 2, before stopping on input again. What is going on here?</p>

<h1 id="why-is-it-waiting-for-keyboard">Why is it waiting for keyboard?</h1>

<p>I figured since it is trying to get input from the console, it probably must be invoking interrupt 16 at some point, so I placed an interrupt breakpoint with <code class="language-plaintext highlighter-rouge">bpint 16</code> in Dosbox. Sure enough, it fired:</p>

<p><img src="/images/start-int16.png" alt="breaking on int 16 in debugger" class="center-image" /></p>

<p>Looking at the contents of the stack in the data overview pane, I can see the address of the segment where I know the executable was loaded (<code class="language-plaintext highlighter-rouge">0x28DA</code>), along with the offset <code class="language-plaintext highlighter-rouge">0x4839</code>. That is the function which invoked the interrupt. By pointing the code overview to that address, and searching for the instructions in my assembly file, I soon had my culprit:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><span class="nf">read512FromFileIntoBuf</span> <span class="nv">proc</span> <span class="nv">near</span>
    <span class="nf">push</span> <span class="nb">ds</span>
    <span class="nf">mov</span> <span class="nb">ah</span><span class="p">,</span> <span class="mh">3Fh</span>
    <span class="nf">mov</span> <span class="nb">bx</span><span class="p">,</span> <span class="ow">seg</span> <span class="nv">startData</span>
    <span class="nf">mov</span> <span class="nb">ds</span><span class="p">,</span> <span class="nb">bx</span>
    <span class="nf">mov</span> <span class="nb">bx</span><span class="p">,</span> <span class="nv">_tmpFileHandle</span> <span class="c1">; device/file handle to use is passes in a global variable</span>
    <span class="nf">mov</span> <span class="nb">cx</span><span class="p">,</span> <span class="mh">200h</span> <span class="c1">;read 512 bytes at most (int 21 returns number of chars)</span>
    <span class="nf">mov</span> <span class="nb">dx</span><span class="p">,</span> <span class="nv">offset</span> <span class="nv">_fileReadBuf</span>
    <span class="nf">int</span> <span class="mh">21h</span> <span class="c1">;DOS - 2+ - READ FROM FILE WITH HANDLE</span>
    <span class="nf">jnb</span> <span class="nv">short</span> <span class="nv">readSuccess</span>
    <span class="nf">mov</span> <span class="nb">dx</span><span class="p">,</span> <span class="nv">offset</span> <span class="nv">_aReadError</span> <span class="c1">;"Read error$"</span>
    <span class="nf">mov</span> <span class="nb">cx</span><span class="p">,</span> <span class="mh">0FFFFh</span>
    <span class="nf">jmp</span> <span class="nv">short</span> <span class="nv">errorAndExit</span>
    <span class="nf">nop</span>
<span class="nl">readSuccess:</span>
    <span class="nf">pop</span> <span class="nb">ds</span>
    <span class="nf">retn</span>
<span class="nf">read512FromFileIntoBuf</span> <span class="nv">endp</span></code></pre></figure>

<p>It does not appear to be calling <code class="language-plaintext highlighter-rouge">int 16</code> directly, but it does try to read from a file. I placed a breakpoint on this routine, and sure enough, the value that it uses for the file handle is zero, meaning it’s reading from <code class="language-plaintext highlighter-rouge">stdin</code> - mystery solved. Presumably, the <code class="language-plaintext highlighter-rouge">int 21</code> handler invokes the keyboard interrupt when it sees that the device to act upon is the standard input.</p>

<h1 id="ok-but-why-isnt-it-reading-the-pic-file">Ok, but why isn’t it reading the PIC file?</h1>

<p>This routine is used by <code class="language-plaintext highlighter-rouge">showPicFile</code> which I was instrumenting with logs before, albeit indirectly; it sets a pointer to the read routine along with the destination buffer and the file handle, then some other routine invokes <code class="language-plaintext highlighter-rouge">read512FromFileIntoBuf</code> through the pointer:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><span class="nf">_showPicFile</span> <span class="nv">proc</span> <span class="nv">near</span>
    <span class="nf">handle</span> <span class="err">=</span> <span class="kt">word</span> <span class="nv">ptr</span> <span class="mi">4</span>
    <span class="nf">pageNum</span> <span class="err">=</span> <span class="kt">word</span> <span class="nv">ptr</span> <span class="mi">6</span>
    <span class="nf">push</span> <span class="nb">bp</span>
    <span class="nf">mov</span> <span class="nb">bp</span><span class="p">,</span> <span class="nb">sp</span>
    <span class="nf">push</span> <span class="nb">di</span>
    <span class="nf">push</span> <span class="nb">si</span>
    <span class="nf">push</span> <span class="nb">es</span>
    <span class="nf">push</span> <span class="nb">bp</span>
    <span class="nf">trace</span> <span class="nv">msg1</span><span class="p">,</span><span class="nb">bp</span><span class="o">+</span><span class="mi">4</span><span class="p">,</span><span class="nb">bp</span><span class="o">+</span><span class="mi">6</span>
    <span class="nf">mov</span> <span class="nb">ax</span><span class="p">,</span> <span class="nv">offset</span> <span class="nv">read512FromFileIntoBuf</span>
    <span class="nf">mov</span> <span class="nv">_readFromFilePtr</span><span class="p">,</span> <span class="nb">ax</span>
    <span class="nf">mov</span> <span class="nb">ax</span><span class="p">,</span> <span class="p">[</span><span class="nb">bp</span><span class="o">+</span><span class="nv">handle</span><span class="p">]</span>
    <span class="nf">mov</span> <span class="nv">_tmpFileHandle</span><span class="p">,</span> <span class="nb">ax</span>
    <span class="c1">; ...</span></code></pre></figure>

<p>I placed a break at the location and I could see that the value of the file handle was <code class="language-plaintext highlighter-rouge">7</code>. Yet when <code class="language-plaintext highlighter-rouge">read512...</code> is invoked, it reads the handle value as zero. I pointed the memory view window to the address and sure enough, it was zero in memory. Is somebody changing the value between its setting in <code class="language-plaintext highlighter-rouge">showPicFile</code> and its usage in <code class="language-plaintext highlighter-rouge">read512...</code>? I used a memory breakpoint (<code class="language-plaintext highlighter-rouge">bpm ds:16ba</code>) in the dosbox debugger, but no, the value did not change. So how come it’s 7 here and 0 over there? Then I noticed it. The data segment address was different.</p>

<h1 id="problem-between-chair-and-keyboard">Problem between chair and keyboard</h1>

<p>It was caused by the following lines in <code class="language-plaintext highlighter-rouge">read512...</code>:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm">    <span class="nf">mov</span> <span class="nb">bx</span><span class="p">,</span> <span class="ow">seg</span> <span class="nv">startData</span>
    <span class="nf">mov</span> <span class="nb">ds</span><span class="p">,</span> <span class="nb">bx</span></code></pre></figure>

<p>The routine is resetting the data segment, but it’s not the right data segment. At the top of this assembly file, I have what I though was a forward declaration for the data and BSS segments:</p>

<figure class="highlight"><pre><code class="language-nasm" data-lang="nasm"><span class="nf">DOSSEG</span>
<span class="nf">.8086</span>
<span class="nf">.MODEL</span> <span class="nv">SMALL</span>

<span class="nf">startData</span> <span class="ow">seg</span><span class="nv">ment</span> <span class="kt">word</span> <span class="nv">public</span> <span class="s">'DATA'</span>
<span class="nf">startData</span> <span class="nv">ends</span>
<span class="nf">startBss</span> <span class="ow">seg</span><span class="nv">ment</span> <span class="kt">byte</span> <span class="nv">public</span> <span class="s">"BSS"</span>
<span class="nf">startBss</span> <span class="nv">ends</span>

<span class="nf">DGROUP</span> <span class="nv">GROUP</span> <span class="nv">startData</span><span class="p">,</span><span class="nv">startBss</span>
<span class="nf">ASSUME</span> <span class="nb">DS</span><span class="p">:</span><span class="nv">DGROUP</span></code></pre></figure>

<p>However, in the autogenerated assembly file that contains all the actual data, I switched over to the simplified/standard MASM segment declarations with <code class="language-plaintext highlighter-rouge">.DATA/.DATA?</code> and forgot about it. There is no such thing as <code class="language-plaintext highlighter-rouge">startData</code>, it’s just an empty segment that the assembler created, and it just happened to contain zeros at runtime, which got into the file handle, redirecting the read from the file to the standard input.</p>

<h1 id="it-all-comes-together">It all comes together</h1>

<p>The seemingly irrational behaviour of the bug was because I was logging to a file, and staring at a black screen in the emulator. Sometimes I was pressing keys which made the row processing routine advance further, sometimes not. There is also a bug in vanilla DosBox where if the emulator window gains focus, it will sometimes start sending bogus keyboard input into the window until you hit a key. That must have kicked in the time I saw the routine advance past row 70 in the logs. Also, the keys were coming in pretty fast, so there was never a time difference of more than 1 second. My time delta calculation was correct, when I was looking at it on the console and waiting a couple seconds between the keypresses, the printed delays were accurate.</p>

<p>Whew. That one was pretty wild. Time to fix it.</p>

<h1 id="will-it-start">Will it START?</h1>

<p>I removed the unneeded segment declaration from the assembly file, and replaced any references to <code class="language-plaintext highlighter-rouge">startData</code> with <code class="language-plaintext highlighter-rouge">@data</code>, which is the automatic MASM equate for the data segment. Switched the logging routine to print to a file again, and restored the switch to graphical mode back the way it was. Then I fired it up.</p>

<p><img src="/images/start-runs.png" alt="the executable running in the game" class="center-image" /></p>

<p>I swear, I’ve never been so happy to see the game title screen before. It does not matter that it crashed upon reaching the pilot selection screen. For it to display the splash screen, a bunch of important things must be going right:</p>

<ol>
  <li>The overlay drivers are being set up correctly</li>
  <li>The executable is healthy, able to call into C code, assembly code and the overlays plus return without issue</li>
  <li>The different layout of the data segment is not preventing the game from working, meaning (most of) the data references were resolved correctly</li>
</ol>

<p>Better still, this validates my methodology with using <code class="language-plaintext highlighter-rouge">mzdiff</code> to compare the code while reconstructing it, proving that the approach is viable. Sometimes when I’m looking at the diagram in <a href="https://github.com/neuviemeporte/mzretools?tab=readme-ov-file#mzretools">the tools’ README</a>, it makes me think it’s too convoluted and ridiculous, but that’s just what it takes to dissect and put the game back together. Without the tooling, none of this would have been any simpler – all the arrows would just be going to the person shape, and I would have to do all of this stuff manually anyway.</p>

<p>I must say I am pretty happy with my New Year’s present. Looking forward to getting the first part of the game fully running soon.</p>]]></content><author><name></name></author><category term="f15-se2" /><summary type="html"><![CDATA[(…continued from Part 1)]]></summary></entry><entry><title type="html">Improvements in tooling</title><link href="/f15-se2/2024/12/31/newtooling.html" rel="alternate" type="text/html" title="Improvements in tooling" /><published>2024-12-31T00:00:00+00:00</published><updated>2024-12-31T00:00:00+00:00</updated><id>/f15-se2/2024/12/31/newtooling</id><content type="html" xml:base="/f15-se2/2024/12/31/newtooling.html"><![CDATA[<p><small>(<em>This post is part of a <a href="/category/f15-se2.html">series</a> on the subject of my hobby project, which is recreating the C source code for the 1989 game <a href="/f15-se2/2022/06/05/origins.html">F-15 Strike Eagle II</a> by reverse engineering the original binaries.</em>)</small></p>

<p>This is a small update to let anyone interested know that I’ve not abandoned the project, but it’s been on a short hiatus. I’m getting back into it, albeit slowy, so I thought I would write up what I have been up to in the meantime.</p>

<p>Back around August of 2024, I “finished” the reconstruction of the C code for the first executable of F15 (START.EXE). I left the assembly routines mostly as they were, except for replacing all the hardcoded offsets with references to specific symbols, to make the executable independent of the data layout, so that it would link and run with customizations (particularly instrumentation for debugging). If successful, this would be a great milestone.</p>

<p>Unfortunately, the reconstructed binary does not work inside the game. It freezes with a black screen when starting up, before even showing the first splashscreen. I did some initial debugging, and the problem seems to be around decoding of the first splash screen image, but the results were inconclusive and I was getting a little bit burned out, so I decided to shift gears and spend some time on improving my <a href="https://github.com/neuviemeporte/mzretools">tooling</a>. In this post, I’m going to focus on the improvements I implemented recently.</p>

<p>The official excuse was that the <code class="language-plaintext highlighter-rouge">mzdiff</code> tool which I am using for comparing my reconstruction executable (“target”) to the original (“reference”) does not go into some assembly routines, specifically those which are not seen being called while walking the code. This is the case with some routines that are being called indirectly, like the timer interrupt handler - there is no way to figure out that in the walker. If it does not go into these routines, it does not compare them, and any discrepancies in those low-level routines would make the reconstruction fail.</p>

<p>There is no problem from the reference executable’s side, because the map file generated by the other tool, <code class="language-plaintext highlighter-rouge">mzmap</code> has been also manually tweaked by me to spell out all the missing routines, and where they are located, so <code class="language-plaintext highlighter-rouge">mzdiff</code> can go into it. But the tool doesn’t know what the corresponding address is in the target executable if it didn’t see a call it could derive the relationship from. So I implemented some search capabilities.</p>

<p>Actually, I’m lying. I started implementing the capability, then got bogged down with the implementation details, left it for a month, forgot what I was supposed to do next, found it hard to get back into, then spent a couple months in a loop of shame and guilt. But it’s done now, barely 4 months later.</p>

<h1 id="missed-routines-scrape-up-target-opcode-search">Missed routines scrape-up, target opcode search</h1>

<p>After the main comparison loop runs out of locations to compare, I am scanning the map of the reference executable for “missed” routines (i.e. ones that have not been compared) and insert them into the queue again for the main loop to visit. When it notices it does not have an address in the target, it will initiate the search, looking for corresponding instruction opcodes from the reference in the executable. It starts with a single instruction, and keeps adding more until there’s just a single, unambiguous candidate. Because the layouts of the executables differ, any offsets present in the instructions need to be erased and replaced with “wildcard bytes”, otherwise they would not have been found in the target.</p>

<p>Obviously, if a candidate location for comparison cannot be found, it follows that the routine is not present in the target and the comparison fails.</p>

<h1 id="going-into-the-weeds-rollback-capability">Going into the weeds, rollback capability</h1>

<p>The other big improvement came from a <a href="https://github.com/neuviemeporte/mzretools/issues/3">bug report</a> I got on GitHub. Apparently, the <code class="language-plaintext highlighter-rouge">mzmap</code> tool fails for Duke Nukem 1 and Bio Menace. The symptom was an assertion failure in the instruction decoder. That in itself isn’t a big surprise, the tool only supports 8086 instructions in segmented real mode, and that isn’t probably going to change soon (perhaps when I start digging into protected-mode Microprose games). But upon closer inspection, it turned out it was going into an apparent data block in the middle of a routine, consisting of a bunch of zeros, followed by an odd <code class="language-plaintext highlighter-rouge">0xff</code>, where the assert happened.</p>

<p>I can’t really avoid this in a tool which is just a static code walker. The instruction preceding the data block is a function call, which probably doesn’t return in the real world, so the CPU never goes into the data (or maybe the “data” is rewritten with legitimate code at runtime), but my tool has no way of knowing that. So what I did is change the assertion into an exception that is caught it the comparison loop. If it finds an invalid instruction, it will “rollback”, or mark the entire block from the location the instruction scanning started at as “bad”, then continue with the next location as if nothing happened. I’m happy to say this works pretty well and is going to vastly increase the range of games the tool can work with.</p>

<p>To make the rollback a little less rough on the outcome, I also increased the scan granularity, i.e. made the scan blocks smaller, so that in case of a rollback, it will not mark an entire routine as bad, which could be perfectly fine. Until now, I would scan an entire routine, until I encountered an unconditional jump or a return. Now, every branch, including conditional jumps and calls incurs a scan break, with the destination past the branch added to the search queue as a separate block of the routine.</p>

<h1 id="boring-bugfixes">Boring bugfixes</h1>

<p>Beside those new features, I also implemented a bunch of fixes for reported issues. All of this goodness is included in version <a href="https://github.com/neuviemeporte/mzretools/releases/tag/v0.9.2">0.9.2</a>, and already merged into the master branch.</p>

<h1 id="did-it-help">Did it help?</h1>

<p>Why, of course not. The feature worked fine, but there aren’t any meaningful differences in the unreachable routines for F15’s <code class="language-plaintext highlighter-rouge">START.EXE</code>. So, it’s back to debugging, but at least I’ve managed to dig myself out of a hole and do a bunch of useful improvements to the tooling.</p>

<p>The next post is going to focus on the difficulties I’ve encountered while trying to make the reconstruction runnable. For now, Happy New Year!</p>]]></content><author><name></name></author><category term="f15-se2" /><summary type="html"><![CDATA[(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)]]></summary></entry></feed>