(…continued from Part 1)

If there’s anything life has taught me, it’s got to be that submitting a question publically is a surefire way of figuring out the answer myself within an hour or so… unless it actually takes a couple months.

For context, I am trying to find parts of the game’s data segment which can be attributed to linking in the Microsoft C library. In the first part, I figured out initialized data as the easier task, but was left stumped with the layout of the uninitialized (BSS) data section, which contains a little over 1500 bytes of unaccounted-for data:

dseg:0346                 db 0FFh
dseg:0347                 db 0FFh
dseg:0348                 db 0FFh 🟢 libc initialized data ends here, BSS begins
dseg:0349                 db 407h dup(   ?)       ; what is this?
dseg:0750 word_11CD0      dw ?                    ; my program's data
dseg:0752 word_11CD2      dw ?                    ; 
dseg:0754                 db 20Ch dup(   ?)       ; what is this?
dseg:0754 dseg            ends

I started approaching this with the idea I had at the end of the previous part - removing all the libc function references from my experiment and readding them one by one, to figure out what makes these mysterious areas appear and narrow down where they are coming from. Soon enough, I had my culprit - any of the file I/O-related functions such as fopen/fread/fflush/… would result in these appearing. It looks like it might be some buffer area related to file I/O? Makes a lot of sense, but what doesn’t is that there were no references to these areas from the libc code. I looked over the code of all libc functions in a minimal executable file reproducing this result (just the startup code and main() calling fflush()), paying special attention to immediate values which could be offsets into the BSS – no luck. The only reference same from the startup code:

seg000:0064 start           proc near
[...]
seg000:00C9                 assume es:dseg
seg000:00C9                 cld
seg000:00CA                 mov     di, 349h ; beginning of BSS area
seg000:00CD                 mov     cx, 960h ; data segment end
seg000:00D0                 sub     cx, di   ; _end - _edata = BSS size
seg000:00D2                 xor     ax, ax
seg000:00D4                 rep stosb        ; zero out the BSS
[...]

This code performs zeroing out of the BSS before control passes to main(). But there was no code using the BSS!

I started opening the individual object files for file I/O routines extracted (using objconv) from the slibce.lib library like fflush.obj, fopen.obj etc. in IDA, hoping to find some BSS data inside. No such luck. Where is this crap coming from?!

Then I get an idea. I was using the /M switch to the linker this whole time, which creates a human readable “map file” of the linked executable. Let’s look inside:

Start  Stop   Length Name                   Class
 00000H 0157DH 0157EH _TEXT                  CODE
 0157EH 0157EH 00000H C_ETEXT                ENDCODE
 01580H 015C1H 00042H NULL                   BEGDATA
 015C2H 017DFH 0021EH _DATA                  DATA
 017E0H 017EDH 0000EH CDATA                  DATA
 017EEH 017EEH 00000H XIFB                   DATA
 [...]
 017F0H 017F0H 00000H CONST                  CONST
 [...]
 018CAH 018CAH 00000H _BSS                   BSS
 018CAH 018CAH 00000H XOB                    BSS
 018CAH 018CAH 00000H XO                     BSS
 018CAH 018CAH 00000H XOE                    BSS
 018D0H 01ED3H 00604H c_common               BSS
 01EE0H 026DFH 00800H STACK                  STACK

 Origin   Group
 0158:0   DGROUP

  Address         Publics by Name

 0158:0250       STKHQQ
 0000:0788       _abs
 0158:0050       _barfoo
 0000:14BA       _brkctl
 0000:0E3E       _close
 0158:034A       _edata
 0158:0960       _end
 [..]

This is surprising. I was expecting all my uninitialized data to be part of the _BSS segment, meanwhile it looks like the bulk (0x604 == 1540 bytes) lies in c_common, and the size of _BSS is actually zero. It takes a web search into some long forgotten corners of the Web to figure out what c_common is:

LINK also generates default segments into which it places communal variables declared in COMDEF records.
Near communal variables are placed in one paragraph-aligned public segment named c_common, with class name BSS
(block storage space) and group DGROUP. Far communal variables are placed in a paragraph-aligned segment named FAR_BSS,
with class name FAR_BSS.

An another search defines “communal variables”:

Communal variables are uninitialized variables that are both public and external. They are often declared in include files.
(…) C variables declared outside functions (except static variables) are communal unless explicitly initialized; they are the same as assembly-language communal variables.

OK, so we are looking for communal variables, but where are they? The smoking gun is hidden further in the map file, where all the public symbols are listed (“Publics by Name”). It seems like the data segment is represented by the segment prefix of 158 in the map file, so let’s take a look at it while introducing some ordering:

ninja@dell:eaglestrike$ cat build-f15-se2/HELLO.MAP | grep 158: | sort -u
 0158:0   DGROUP
 0158:0042       _foobar
 0158:0050       _barfoo
 0158:0052       _foobaz
 0158:0062       __asizds
 0158:0064       __atopsp
 0158:0066       __aexit_rtn
 0158:0068       __abrktb
 0158:00B8       __abrkp
 [...]
 0158:034A       _edata
 0158:0350       __bufout
 0158:0550       __bufin
 0158:0750       _x
 0158:0752       _y
 0158:0754       __buferr
 0158:0960       _end

Gotcha! _edata is a handy tag assigned by the linker to the location where the initialized data ends, so it seems my troublemakers are named __bufout, __bufin and __buferr (x and y are sentinel values from my code, which I wrote in Part 1). Whare did y’all come from?

ninja@dell:slibce$ for f in *.obj; do echo "==== $f"; dmpobj $f | grep bufout; done
[...]
==== _file.obj
    3 - '__bufout' Type 0, NEAR, Size:00000200h
==== _flsbuf.obj
    5 - '__bufout' Type:0
[...]

Looks like these came from the object file _file.obj, so let’s inspect it:

ninja@dell:slibce$ dmpobj _file.obj
[...]
COMDEF(b0) recnum:13, offset:000000b0h, len:002ah, chksum:f7h(f7)
    2 - '__bufin' Type 0, NEAR, Size:00000200h
    3 - '__bufout' Type 0, NEAR, Size:00000200h
    4 - '__buferr' Type 0, NEAR, Size:00000200h
[...]

Three buffers of size 0x200 make 0x600, i.e. 1536 bytes. Now I smell blood. But how come I couldn’t find any references to these? No code is using it, and without having some reference to grab onto, I will not be able to locate their equivalent locations in the actual game executable. This time, the key lies in the other object file that references this symbol, _flsbuf.obj. I open it in IDA:

_TEXT:0010 __flsbuf        proc near
[...]
_TEXT:00A8                 jnz     short loc_100E0
_TEXT:00AA                 mov     ax, offset __bufout
_TEXT:00AD                 jmp     short loc_100E3
_TEXT:00AF                 align 2
_TEXT:00B0 loc_100E0:
_TEXT:00B0                 mov     ax, offset __buferr

My minimal example does not have __flsbuf. But when I add all the other libc functions that the game is using, it is also pulled in by the linker. Apparently, these three buffer areas are placed in the executable by the linker even if nothing is referencing them! Damn, those old linkers sure acted weird. I could not find references to the problematic areas because I was looking at a minimal example which had the data, but not the code using the data, which I thought was not possible. But now I can locate __bufout and __buferr by looking at this specific location in __flsbuf. That still leaves __bufin though.

ninja@dell:slibce$ for f in *.obj; do echo "==== $f"; dmpobj $f | grep bufin; done
[...]
==== _file.obj
    2 - '__bufin' Type 0, NEAR, Size:00000200h

This symbol is only referenced in _file.obj, so let’s open it in IDA again:

_DATA:0000 ; FILE _iob[]
_DATA:0000 __iob           dw offset __bufin
_DATA:0002                 align 4
_DATA:0004                 dw offset __bufin
_DATA:0006                 db    1
_DATA:0007                 db    0
_DATA:0008                 db    0
_DATA:0009                 db    0
_DATA:000A                 db    0
_DATA:000B                 db    0
_DATA:000C                 db    0
_DATA:000D                 db    0
_DATA:000E                 db    2
_DATA:000F                 db    1
_DATA:0010                 db    0
_DATA:0011                 db    0
_DATA:0012                 db    0
_DATA:0013                 db    0
_DATA:0014                 db    0
_DATA:0015                 db    0
_DATA:0016                 db    2
_DATA:0017                 db    2
_DATA:0018                 db    0
_DATA:0019                 db    0
_DATA:001A                 db    0
_DATA:001B                 db    0
_DATA:001C                 db    0
_DATA:001D                 db    0
_DATA:001E                 db  84h ; you will be useful

By searching for the value of 0x84 in the executable’s initialized data section, then going back counting the bytes, I can locate the offset containing the pointer to __bufin. Now it all comes together beautifully:

dseg:0346                 db 0FFh
dseg:0347                 db 0FFh
dseg:0348                 db 0FFh
dseg:0349                 db    ? ; probably alignment
dseg:034A bss_start       db    ? ; crt0 data     ; DATA XREF: start+66
dseg:034B                 db    ? ;
dseg:034C                 db    ? ;
dseg:034D                 db    ? ;
dseg:034E                 db    ? ;
dseg:034F                 db    ? ;
dseg:0350 bufout          db 200h dup(   ?)       ; DATA XREF: __flsbuf+9A
dseg:0550 bufin           db 200h dup(   ?)       ; DATA XREF: dseg:012E
dseg:0750 word_11CD0      dw ?                    ; x
dseg:0752 word_11CD2      dw ?                    ; y
dseg:0754 buferr          db 200h dup(   ?)       ; DATA XREF: __flsbuf:loc_10BBA
dseg:0954                 db    ? ; probably more alignment to end on paragraph boundary
dseg:0955                 db    ?
dseg:0956                 db    ?
dseg:0957                 db    ?
dseg:0958                 db    ?
dseg:0959                 db    ?
dseg:095A                 db    ?
dseg:095B                 db    ?
dseg:095C                 db    ?
dseg:095D                 db    ?
dseg:095E                 db    ?
dseg:095F                 db    ?
dseg:095F dseg            ends

It was trivial to find the corresponding areas in the game’s executable by examining the equivalent telltale locations. I now have all the data regions introduced by libc identified and marked in the disassembly. It may not be much, but it’s always an extra piece of information and a necessary step if I ever aim for a near-perfect recreation (I’ll explain why a fully perfect one is not feasible another time). If it ever comes to that, this post will have a Part 3, but I’m pretty satisfied with what I got right now.

If that’s the good news, the bad news has got to be that I’m officially out of excuses to continue the reconstruction of a nasty game routine that ostensibly has to do with mission generation, consists of multiple nested loops and conditions, with an occasional goto sprinkled in, and nothing it does makes sense on the surface - just reading and writing of random-looking numeric data all over the place. But I’ll figure it out. Some day.