Ghidra to the rescue
(This post is part of a series on the subject of my hobby project, which is recreating the C source code for the 1989 game F-15 Strike Eagle II by reverse engineering the original binaries.)
I have to admit I’ve been stuck with the project for about 3 months. I’m still reconstructing the source code for the first executable, and I’ve run into a rather long game routine at offset 0x4093
, which has more than 2800 instructions. Worse still, it does not make any sense, at least not at a cursory glance. From live debugging the game, I know that it runs when the ingame briefing screen displays the caption “decoding mission…”, so it appears to be the randomized mission generator. But all it does is read a bunch of nonsense-looking numerical data (some of which comes from reading the .wld
(“world”) binary files, the internal layout of which is still a mystery to me) from one place, call rand()
a lot, then write a bunch of nonsense-looking numerical values back all over the place. No string processing to tell me what these numbers are, no graphical routines called to display stuff that I could watch for context, just a black box crunching numbers.
There’s more bad news. The control flow for this routine is completely bonkers, and consists of multiple nested loops, inside of which are conditions, inside of which are loops… etc., with what looks like an occasional goto thrown in the mix. Reconstructing the flow actually broke my brain to the point where I lost all enthusiasm for the project. It was just a grind, with obfuscated code coming out of the process that barely looking better then the assembly that comes in.
Back when I was talking to the ex-Microprose employee, they told me they had to modify the mission generation code at some point for F117 (the next game on this engine), and it was a convoluted mess, including many goto
s. I’m inclined to think that this the routine. It would be interesting to find an equivalent in F117’s code and compare the two, and I’m planning on writing a tool specifically for identifying similar routines in MZ executables using edit distance at some point, but for now I need to reconstruct this routine’s source code.
I’ve been aware of the existence of Ghidra for a while now, and I’ve even tried opening the game’s executables with it, but the result was not encouraging. Ghidra can decompile assembly code into C, but it is not really meant for reversing 16bit code – as I understand it was developed at NSA, mainly for taking apart malware samples. Modern stuff, not old stuff. The code it outputs for 16bit binaries is largely nonsense, especially around memory access – it just doesn’t seem to understand segmented addressing, but then again, who does? 😋 However, it can recover the control flow pretty well. So what if I could fix the code where Ghidra makes a mess of things, while relying on it to sort out the convoluted control flow mess for me?
That’s what I ended up doing, I just copy-pasted the Ghidra-generated code for the whole routine into my editor and started tweaking it line by line by looking into IDA at the same time. Occassionally, I needed to do some experimenting in a sample DOS .exe application on the side to figure stuff out, but I was making progress (slowly).
One minor problem with Ghidra output (as far as the control flow is concerned, not considering any operations inside control structures) is that it tends to reverse the order of if-else
blocks sometimes compared to what MSC generates. So for example this code compiled with MSC:
if (a == 0) {
// do stuff
}
else {
// do other stuff
}
…would get turned into the following assembly:
cmp a, 0
jnz otherStuff
; do stuff
otherStuff:
; do other stuff
In other words, an inverse conditional jump instruction (jnz
- Jump if Not Zero) is used in comparison to the C condition (a == 0
). This however preserves the logical ordering of blocks from C, so it’s easier to follow. Ghidra will often output this back from the assembly:
if (a != 0) {
// do other stuff
}
else {
// do stuff
}
This can get confusing if the conditional blocks contain many operations, potentially inside other nested blocks, one needs to check carefully if the order of the blocks matches the disassembly annd invert the condition and the order if necessary.
When I’m writing this, the C file with this routine still does not build due to unresolved data references (more on that later), but I’ve finished rewriting the source code (coming in at around 370 lines, including comments), and the control flow looks consistent. I still need to fix the data layout problem, but it feels like I’m past my slump. Below is a sample of the code I wrote on top of Ghidra’s mess, so the reader has an idea of what we are dealing with here:
// [...]
do {
var_2 = var_2 + 1;
if (999 < var_2) goto counterMore1k;
// 40b5
do {
if (missionPick != -1) {
// 40c6
var_1A = randMul(word_19324[missionPick * 2]);
// 40e9
word_1CDE0 = sub_14BB4(off_19304[missionPick * 2][var_1A * 2],
off_19314[missionPick * 2][var_1A * 2], 1);
}
// 40f4
else {
do {
do {
// 40f8
var_1A = randMul(0xe0) * 0x80 + 0x840;
// 410c
var_24 = randMul(0xe0) * 0x80 + 0x840;
// 412d
} while ((wldReadBuf10[var_1A >> 0xb + (var_24 >> 0xb) * 0x10] & 3) != 0);
// 413e
} while ((word_1CDE0 = sub_14BB4(var_1A,var_24,1)) == -1);
}
// 414c
if (missionPick == 7) {
// 4172
word_1CDF2 = sub_14BB4(off_19304[missionPick * 2][var_1A * 2],
off_19314[missionPick * 2][var_1A * 2] + 0x28, 2);
}
// 417e
else if (missionPick == 2) {
var_1A = var_1A * 2 + randMul(2);
// 41a9
word_1CDF2 = sub_14BB4(word_192EC[var_1A * 2], word_192F4[var_1A * 2], 2);
}
// 41b5
else if (missionPick == 6) {
var_1A = randMul(6) + var_1A + 1 & 7;
// 41e0
word_1CDF2 = sub_14BB4(word_19294[var_1A * 2], word_192A4[var_1A * 2], 2);
}
// 41eb
else {
do {
do {
// 41ef
var_1A = randMul(0xe0) * 0x80 + 0x840;
// 4203
var_24 = randMul(0xe0) * 0x80 + 0x840;
// 4224
} while ((wldReadBuf10[(var_1A >> 0xb) + (var_24 >> 0xb) * 0x10] & 3) != 0);
// 4235
word_1CDF2 = sub_14BB4(var_1A, var_24, 2);
} while ((word_1CDF2 == -1) || ((missionPick == 0 && (wldReadBuf4[3 + word_1CDF2 * 0x10] == 0))));
}
// 4257
} while ((word_1CDE0 == word_1CDF2) || (sub_14C94(word_1CDE0, word_1CDF2) >> 6) > 200);
// 427a
} while ((gameData->theater != THEATER_DS)
&& (wldReadBuf4[word_1CDE0 * 0x10] == wldReadBuf4[7 + word_1CDF2 * 0x10]));
// 42a0
for (var_2A = 0; var_2A < 2; var_2A++) {
// 42b8
var_20[var_2A] = 0x7fff;
// 42bd
for (var_26 = wldReadBuf3; var_26 < readItemSize; var_26++) {
// 42d3
if ((((wldReadBuf4[4 +var_26 * 0x10] & 0x500) != 0) && ((wldReadBuf4[4 +var_26 * 0x10] & 0x201) != 0))
&& ((wldReadBuf4[4 +var_26 * 0x10] & 0x800) == 0)) {
// 4332
// placed in var_1C in IDA, but this looks like an array, sort out stack layout later
var_20[2] = sub_15472((wldReadBuf4[4 +var_26 * 0x10] & 0x100 == 0 ? 0 : randMul(100) * 0x40 + 0xc80)
+ sub_14C94(*(&word_1CDDE +1 +var_2A * 0x12),var_26,0,0x7fff));
// 433b
if ((var_20[2] < 0x7000) && (randMul(0x500) + var_20[2] > var_20[var_2A])) {
// 4357
*(&word_1CDDE +2 +var_2A * 0x12) = var_26;
var_20[var_2A] = var_20[2];
}
}
}
}
// [...]
Beautiful, innit? 😉
Here I wanted to discuss some of the more interesting bits of assembly encountered in this routine, and the C code they end up resolving into, with examples of the “interesting” decompiled code generated by Ghidra. I’m not criticizing the tool; it’s my problem that I’m not using it what it was meant for, and without it I would probably still be stuck, but I wanted to provide an account of what can be expected from it when used in this way.
32bit arithmetic, anyone?
One day, I come upon this wonderful code:
startCode1:468C mov si, word_1CDE2
startCode1:4690 mov cl, 4
startCode1:4692 shl si, cl
startCode1:4694 mov ax, word_1C82A[si]
startCode1:4698 sub dx, dx
startCode1:469A mov cl, 5
startCode1:469C loop_1469C:
startCode1:469C shl ax, 1
startCode1:469E rcl dx, 1
startCode1:46A0 dec cl
startCode1:46A2 jz short loc_146A6
startCode1:46A4 jmp short loop_1469C
startCode1:46A6 loc_146A6:
startCode1:46A6 mov word ptr dword_1D5D0, ax
startCode1:46A9 mov word ptr dword_1D5D0+2, dx
Looking over to the C side, what Ghidra generated was this:
iVar12 = *(int *)0x6292 * 0x10;
iVar7 = *(int *)(iVar12 + 0x5cda);
uVar10 = 0;
cVar8 = '\x05';
do {
bVar13 = iVar7 < 0;
iVar7 = iVar7 * 2;
uVar10 = uVar10 << 1 | (uint)bVar13;
cVar8 = cVar8 + -1;
} while (cVar8 != '\0');
*(int *)0x6a80 = iVar7;
*(uint *)0x6a82 = uVar10;
Wait, what?
Part of the problem here is that Ghidra can’t make much sense of 16-bit data references, so there are a lot of casts of raw addresses into pointers that are dereferenced inline, like the *(int *)0x6292
. Then, it decides to create a bunch of temporary intermediary variables like iVar12
to hold parts of the expressions it’s trying to evaluate, but from prior experience I know that introducing extra variables will result in writing the intermediate values to memory, so the temporary variables need to go, and the code needs to be folded into a minimal number of lines if I’m expecting it to literally match.
I also know from experience that when the compiler is using registers ax
and dx
together, it’s usually trying to do arithmetic on 32bit (long
) numbers, which don’t fit into the 16-bit registers of the 8086 CPU. But what’s with the shifting the registers by one bit in a loop? Well, it can’t shift the entire thing by the desired 5 bits in one go, because it needs to process the two halves of the long number separately. That’s why it shifts ax
left by one bit with shl
, but this can cause the leftmost bit to be shifted out into the carry flag, so rcl
(rotate through carry) is used to finalize the shift into dx
. I quickly whip up a one line concept on the side, build it with MSC and disassemble, repeating until I get perfectly matching instructions. What it ends up resolving to is really simple:
dword_1D5D0 = (long)(word_1C82A[word_1CDE2 * 0x10]) << 5;
Bottom line is, if you’re trying to work out 32bit arithmetic in 16bit code using a tool which does not understand 16bit code, then there’s something wrong with you. 🤪
Using example values to figure out opaque arithmetic operation sequences
Another day, another bit of interesting assembly:
startCode1:47D0 mov ax, [bp+var_20] ; ax = 9123 (-28381)
startCode1:47D3 cwd ; dx = ffff (-1)
startCode1:47D4 xor ax, dx ; ax = 6edc (28380)
startCode1:47D6 sub ax, dx ; ax = 6edd (28381)
startCode1:47D8 mov cx, 2
startCode1:47DB sar ax, cl ; ax = 1bb7 (7095)
startCode1:47DD xor ax, dx ; ax = e448 (-7096)
startCode1:47DF sub ax, dx ; ax = e449 (-7095)
startCode1:47E1 mov cx, 4
startCode1:47E4 sub cx, difficultySaved ; cx = 3
startCode1:47E8 imul cx ; ax = acdb (-21285)
startCode1:47EA mov [bp+var_14], ax
Ghidra is no help as expected:
uVar10 = (int)var_20[0] >> 0xf;
local_16 = (((int)((var_20[0] ^ uVar10) - uVar10) >> 2 ^ uVar10) - uVar10) * // 😭😭😭
(4 - difficultySaved);
It’s using cwd
(convert word to double) to extend a 16-bit value in register ax
to 32 bits through dx
. So, is it 32 bit arithmetic again? Maybe, but why doesn’t it use dx
at the end, just writes the ax
part into a local value? Here, I’ve decided to annotate the assembly with register values for a particular starting value (here, a negative number like 0x9123
was more interesting to see what was happening). You can see that xor+sub
is used to obtain the absolute value of ax
, while dx
just holds the sign bit, as it were. Then sar
is used to perform division by 4 on the positive value, and another xor+sub
restores the original signedness using the sign value stored in dx
. Finally, the result is multiplied by another value in an unremarkable way. This is just an equivalent of:
int difficultySaved;
void foobar() {
int var_20;
int var_14;
var_14 = (var_20 / 4) * (4 - difficultySaved);
}
In other words, ax:dx
and cwd
usually means a 32bit number manipulation (number or pointer) – unless it doesn’t. 😈 In this case, the compiler seems to have figured out that it could do division by a power of two using sar
more efficiently than through idiv
, but it needed to force the number to positive and store the sign bit for undoing the conversion before writing back the result.
32bit arithmetic out of left field
You know the drill by now:
startCode1:46B9 mov ax, 708h
startCode1:46BC cwd
startCode1:46BD mov cx, word ptr word_1C82C[si] ; cx = 42c0 (17088)
startCode1:46C1 sub bx, bx ; bx = 0
startCode1:46C3 sub cx, 8000h ; cx = c2c0 (49856), carry = 1
startCode1:46C7 sbb bx, bx ; bx = ffff
startCode1:46C9 neg cx ; cx = 3d40 (15680)
startCode1:46CB adc bx, 0 ; bx = 0
startCode1:46CE neg bx ; bx = 0
startCode1:46D0 mov word ptr [bp+var_30], cx ; 3d40
startCode1:46D3 mov word ptr [bp+var_30+2], bx ; 0
I won’t even bother looking at Ghidra’s ideas on this. A red herring is present in the form of the sbb+neg
pattern which I found used to perform branchless NULL
checks before, but that is not what this is.
Tracing register values through an example execution is also helpful here. This was actually part of a longer calculation, but the cwd
on a constant value is a hint that we’re dealing with long numbers again. But it’s using bx:cx
to hold the halves this time, because ax
is occupied already and will be used in a later part of the expression.
The input is obviously the word value 0x4c20
in cx
, and the result is the long number 0x00003d40
in bx:cx
placed in var_30
at the end. Looking at the instructions, the literal 0x8000
is used in a sub
instruction, so that value is also part of the calculation. Incidentally, 0x8000 - 0x4c20 = 0x3d40
, so I can risk a guess without even trying to untangle the sbb-neg-adc
mumbo-jumbo:
int word_1C82C;
void foobar() {
long var_30;
var_30 = 0x8000 - (long)word_1C82C;
}
Surprisingly enough, it actually matches the assembly. Woohoo! Now it’s easier to look at the actual instructions to make sense of what the compiler did there. It’s zeroing out bx
for the older word of the double number, and putting the input in cx
. Instead of putting 0x8000
in a register and subtracting cx
from it, it does the opposite which I guess is related to the fact that it doesn’t have too many registers to spare. The negative carry of the result is placed in bx
, which means it becomes -1 when the result was negative (meaning the original expression of cx - 0x8000
is positive). Then cx
is negated to obtain the younger part of the result, the carry is added back to bx
, making it zero out, and lo and behold we have the desired value of 0x00003d40
in bx:cx
The takeaway is that long arithmetic can jump out of the bushes and kick you in the butt using a different set of registers than what you’re used to, and that known idioms like sbb+neg
can be misleading.
Manipulating register halves for fun and profit
This had me puzzled:
startCode1:4784 mov ax, word_182BE
startCode1:4787 mov cl, 0Ah
startCode1:4789 shr ax, cl
startCode1:478B shl ax, cl
startCode1:478D add ah, 2
startCode1:4790 mov word_182BE, ax
8bit register halves like ah
are rarely used unless explicit 8bit value manipulation is requested, and it’s especially strange in the middle of what appears to be regular 16bit calculation. Oddly enough, it appears to be a shortcut to adding a constant number whose lower byte is zero, and is matched by the code below:
unsigned int word_182BE;
int func2() {
word_182BE = ((word_182BE >> 0xa) << 0xa) + 0x200;
}
This will be the last arithmetic puzzle, I promise
Some fun code from the very end of this large routine:
startCode1:4B9D mov ax, [bp+var_8]
startCode1:4BA0 add ax, word_1DD38
startCode1:4BA4 cwd
startCode1:4BA5 mov cx, 96h
startCode1:4BA8 idiv cx
startCode1:4BAA sub word_1DD38, dx
startCode1:4BAE pop si
startCode1:4BAF pop di
startCode1:4BB0 mov sp, bp
startCode1:4BB2 pop bp
startCode1:4BB3 retn
startCode1:4BB3 sub_14093 endp
startCode1:4BB3
The cwd
might be a cast of the accumulator into long again, then again it might not. The end result of this operation is to decrease the value of word_1DD38
by the amount of dx
, but why bother putting stuff in ax
and perform calculations that you’re not going to use?
This confused me because I was unaware how the idiv
instruction operates. I thought it just divided the accumulator by the argument, but this is only true in the case where the argument is byte-sized. When the argument is word-sized, it actually divides the long value in dx:ax
by the argument. Also, in such case in addition to the division result in ax
, it puts the remainder (modulus) in dx
(al
and ah
are used for the byte-sized variant). So in the end the last statement of the routine is:
word_1DD38 -= (var_8 + word_1DD38) % 0x96;
Jump table usage to implement a switch
This part of the code actually gave my tooling trouble in the past, because offset values are placed directly into the code segment, and I had to implement guard rails to make sure I was not interpreting what is essentially data inside the code segment as machine instructions.
; ... value to switch on is calculated and placed into ax
startCode1:49E9 jmp short switch_14A0D
[...]
startCode1:4A0D
startCode1:4A0D switch_14A0D:
startCode1:4A0D cmp ax, 8
startCode1:4A10 ja short case246_14A2C
startCode1:4A12 add ax, ax
startCode1:4A14 xchg ax, bx
startCode1:4A15 jmp cs:off_14A1A[bx]
startCode1:4A1A off_14A1A dw offset case013_149EB
startCode1:4A1C dw offset case013_149EB
startCode1:4A1E dw offset case246_14A2C
startCode1:4A20 dw offset case013_149EB
startCode1:4A22 dw offset case246_14A2C
startCode1:4A24 dw offset case578_149FB
startCode1:4A26 dw offset case246_14A2C
startCode1:4A28 dw offset case578_149FB
startCode1:4A2A dw offset case578_149FB
As expected, a jump table like this is used to implement a switch statement. It’s doing a cmp
to make sure the value is within the bounds of the jump table, then doubles the value with add
to account for the fact that entries in the jump table are 2 bytes long (16bit offsets), finally places the offset into bx
and does a jump into a location from the jump table. Here, a minor twist lies in the fact that one of the cases is empty, and the switch just jumps to the same bit of code for the values of 2/4/6 as when the value is out of bounds. Might also been written as a fall-through into the default case, but right now I have it like this:
// 4a0d
switch((gameData->flag4 != 0) + randMul(5) + difficultySaved) {
case 0:
case 1:
case 3:
// 49eb
var_18 = word_18930[var_18 * 4];
break;
case 2:
case 4:
case 6:
break;
case 5:
case 7:
case 8:
// 49fb
var_18 = word_1892E[var_18 * 4];
break;
} // 4a2c
// 4a36
word_1C82E[var_26 * 0x10] = var_18;
It’s just an interesting example of how MSC implements a switch statement, for future reference.
Reconstructing the data segment layout
A major challenge in a project like this is figuring out how the data was organized into variables in the original code, because the way I see it now is just as a bunch of binary values in linear sequence, without much of a hint where any boundaries between consecutive values originally were.
Of course, IDA traces references to data from the code and assigns autogenerated names to locations which were referenced. Sometimes, in combination with careful analysis, this allows me to figure out the purpose of a chunk of data and where it likely begins and ends:
startData:076E timerCounter db 0
startData:076F timerCounter2 db 0
startData:0770 timerCounter3 db 0
startData:0771 timerCounter4 db 0
Most often however, I have no idea what these values originally were:
startData:29FE word_1954E dw 297Eh ; DATA XREF: __getstream+Ar
startData:29FE ; _flushall:loc_16828r
startData:2A00 word_19550 dw 0 ; DATA XREF: __flsbuf+90w
startData:2A00 ; __openfile+B8w
startData:2A02 word_19552 dw 0 ; DATA XREF: _malloc+1Fw
startData:2A04 word_19554 dw 0 ; DATA XREF: _malloc+22w
startData:2A06 db 0
startData:2A07 db 0
startData:2A08 word_19558 dw 0 ; DATA XREF: _malloc+32w
The named word_/byte_
locations are referenced in the code, but that does not mean these correspond to consecutive variables:
int x = 0x297e; // word_1954E
int y = 0; // word_19550
int z = 0; // word_19552
int q = 0; // word_19554
int r = 0; // unreferenced
int v = 0; // word_19558
This could just as well have been an array, and the references come from directly accessing indices within:
int x[123];
x[5] = x[0] + x[1] + x[2] + x[3]
…or even a struct:
struct Foo {
int x; // word_1954E
int y; // word_19550
int z; // word_19552
int q; // word_19554
int r; // unreferenced
int v; // word_19558
} f = { 0x297e, 0, 0, 0, 0, 0 };
f.v = f.x + f.y + f.z + f.r;
So, to a concrete example. A particular static buffer area that is read into from the .wld
parsing routine is looking in IDA as following:
startData:64BC wldReadBuf6 dw ?
startData:64BE word_1D00E dw ?
startData:64C0 word_1D010 dw ?
startData:64C2 db 10h dup(?)
startData:64D2 word_1D022 dw ?
startData:64D4 word_1D024 dw ?
startData:64D6 db ?
startData:64D7 db ?
startData:64D8 word_1D028 dw ?
startData:64DA db ?
startData:64DB db ?
startData:64DC db ?
[...about 700 bytes ]
I don’t know what exactly this buffer contains, but that is fine at this point. The problem is that the location 0x64bc
, aka wldReadBuf6
is referenced with an index in the game code:
startCode1:48E5 mov [bp+var_14], ax
startCode1:48E8 mov ax, 24h ; '$'
startCode1:48EB imul [bp+var_26]
startCode1:48EE mov bx, ax
startCode1:48F0 mov ax, [bp+var_C]
startCode1:48F3 mov wldReadBuf6[bx], ax ; 😲
That would definitely make it read into the subsequent values marked by IDA as referenced elsewhere (word_1D00E
etc.). That in itself might not be surprising. Perhaps wldReadBuf6
was an array of ints, and the latter references come from indexing into it with constant indices?
int wldReadBuf6[700];
void someFunc() {
for (int i = 0; i < sizeof(wldReadBuf6); i++) {
wldReadBuf6[i] = ... // accessing the data inside through a variable index
}
wldReadBuf[1] = wldReadBuf[2] + ... // accessing the same data through constant indices causing the baked-in references like word_1D00E
}
So I check where and how word_1D00E
and friends are used in the code with IDA’s cross-reference:
startCode1:4854 mov ax, 24h ; '$'
startCode1:4857 imul [bp+var_26]
startCode1:485A mov di, ax
[...]
startCode1:4869 sub ax, word_1D00E[di]
Turns out the subsequent words are also used as bases for an index all over the code. So what, a bunch of overlapping arrays? Dynamically-calculated pointers to specific locations in the one primary array? Not likely, I would see offsets to them stored somewhere. Maybe a more complicated calculation for the index, e.g.
wldReadBuf[1 + var_26 * 0x24]; // would be equivalent to word_1D00E[var_26 * 0x24]
wldReadBuf[2 + var_26 * 0x24]; // word_1D010[var_26 * 0x24]
wldReadBuf[11 + var_26 * 0x24]; // word_1D022[var_26 * 0x24]
I was scratching my head for a while, when I realized that the addition could be moved to the end of the index:
wldReadBuf[(var_26 * 0x24) + 1]; // word_1D00E[var_26 * 0x24]
wldReadBuf[(var_26 * 0x24) + 2]; // word_1D010[var_26 * 0x24]
wldReadBuf[(var_26 * 0x24) + 11]; // word_1D022[var_26 * 0x24]
It’s not using these word locations as a base for the index, it’s indexing into same-sized “slots” from the beginning of the buffer, and then reaching into offsets within those slots. It’s an array of structures! The location is the same, but I had the layout backwards. The constant 0x24
factor in the multiplication is the size of the structure, and the added offsets are the structure members.
00000000 Buf6Item struc ; (sizeof=0x24, mappedto_9)
00000000 field_0 dw ?
00000002 field_2 dw ?
00000004 field_4 db 18 dup(?)
00000016 field_16 dw ?
00000018 field_18 db 4 dup(?)
0000001C field_1C db 8 dup(?)
00000024 Buf6Item ends
Essentially, the layout of the words at the beginning of the buffer marked off as references by IDA maps 1:1 into the structure layout. That makes the code make sense now
startData:64BC ; struct Buf6Item wldReadBuf6[20]
startData:64BC wldReadBuf6 Buf6Item 14h dup(<?>)
[...]
startCode1:484C mov si, word_1CDE2
startCode1:4850 mov cl, 4
startCode1:4852 shl si, cl
startCode1:4854 mov ax, SIZEOF_BUF6ITEM ; 0x24
startCode1:4857 imul [bp+var_26]
startCode1:485A mov di, ax
startCode1:485C mov ax, word ptr wldReadBuf6.field_4[di]
startCode1:4860 sub ax, word ptr wldReadBuf4.field_4[si]
startCode1:4864 push ax
startCode1:4865 mov ax, wldReadBuf4.field_2[si]
startCode1:4869 sub ax, wldReadBuf6.field_2[di]
I also made my Python script which parses IDA listings to auto-generate header files for C compatible with structure data. It doesn’t spit out structure layouts for C automatically (yet), but at least I don’t have to manually rewrite the headers every time I mix things up in IDA. As I mentioned in the introduction, the routine still does not compile due to not all data references being resolved on the C side, but I’m confident I can work it out in a couple days. Then it will just be a matter of running mzdiff
on the end result, and likely a couple of iterations to iron out the remaining discrepancies. But at least it feels doable, which I couldn’t say before I opted to use Ghidra for recovering the control flow. I think I might try to use it more in the future, even despite its limitations.