More delinking fun
(…continued from Part 1)
If there’s anything life has taught me, it’s got to be that submitting a question publically is a surefire way of figuring out the answer myself within an hour or so… unless it actually takes a couple months.
For context, I am trying to find parts of the game’s data segment which can be attributed to linking in the Microsoft C library. In the first part, I figured out initialized data as the easier task, but was left stumped with the layout of the uninitialized (BSS) data section, which contains a little over 1500 bytes of unaccounted-for data:
dseg:0346 db 0FFh dseg:0347 db 0FFh dseg:0348 db 0FFh 🟢 libc initialized data ends here, BSS begins dseg:0349 db 407h dup( ?) ; what is this? dseg:0750 word_11CD0 dw ? ; my program's data dseg:0752 word_11CD2 dw ? ; dseg:0754 db 20Ch dup( ?) ; what is this? dseg:0754 dseg ends
I started approaching this with the idea I had at the end of the previous part - removing all the libc function references from my experiment and readding them one by one, to figure out what makes these mysterious areas appear and narrow down where they are coming from. Soon enough, I had my culprit - any of the file I/O-related functions such as fopen
/fread
/fflush
/… would result in these appearing. It looks like it might be some buffer area related to file I/O? Makes a lot of sense, but what doesn’t is that there were no references to these areas from the libc code. I looked over the code of all libc functions in a minimal executable file reproducing this result (just the startup code and main()
calling fflush()
), paying special attention to immediate values which could be offsets into the BSS – no luck. The only reference same from the startup code:
This code performs zeroing out of the BSS before control passes to main()
. But there was no code using the BSS!
I started opening the individual object files for file I/O routines extracted (using objconv) from the slibce.lib
library like fflush.obj
, fopen.obj
etc. in IDA, hoping to find some BSS data inside. No such luck. Where is this crap coming from?!
Then I get an idea. I was using the /M
switch to the linker this whole time, which creates a human readable “map file” of the linked executable. Let’s look inside:
Start Stop Length Name Class 00000H 0157DH 0157EH _TEXT CODE 0157EH 0157EH 00000H C_ETEXT ENDCODE 01580H 015C1H 00042H NULL BEGDATA 015C2H 017DFH 0021EH _DATA DATA 017E0H 017EDH 0000EH CDATA DATA 017EEH 017EEH 00000H XIFB DATA [...] 017F0H 017F0H 00000H CONST CONST [...] 018CAH 018CAH 00000H _BSS BSS 018CAH 018CAH 00000H XOB BSS 018CAH 018CAH 00000H XO BSS 018CAH 018CAH 00000H XOE BSS 018D0H 01ED3H 00604H c_common BSS 01EE0H 026DFH 00800H STACK STACK Origin Group 0158:0 DGROUP Address Publics by Name 0158:0250 STKHQQ 0000:0788 _abs 0158:0050 _barfoo 0000:14BA _brkctl 0000:0E3E _close 0158:034A _edata 0158:0960 _end [..]
This is surprising. I was expecting all my uninitialized data to be part of the _BSS
segment, meanwhile it looks like the bulk (0x604 == 1540 bytes) lies in c_common
, and the size of _BSS
is actually zero. It takes a web search into some long forgotten corners of the Web to figure out what c_common
is:
LINK also generates default segments into which it places communal variables declared in COMDEF records.
Near communal variables are placed in one paragraph-aligned public segment named c_common, with class name BSS
(block storage space) and group DGROUP. Far communal variables are placed in a paragraph-aligned segment named FAR_BSS,
with class name FAR_BSS.
An another search defines “communal variables”:
Communal variables are uninitialized variables that are both public and external. They are often declared in include files.
(…) C variables declared outside functions (except static variables) are communal unless explicitly initialized; they are the same as assembly-language communal variables.
OK, so we are looking for communal variables, but where are they? The smoking gun is hidden further in the map file, where all the public symbols are listed (“Publics by Name”). It seems like the data segment is represented by the segment prefix of 158
in the map file, so let’s take a look at it while introducing some ordering:
ninja@dell:eaglestrike$ cat build-f15-se2/HELLO.MAP | grep 158: | sort -u 0158:0 DGROUP 0158:0042 _foobar 0158:0050 _barfoo 0158:0052 _foobaz 0158:0062 __asizds 0158:0064 __atopsp 0158:0066 __aexit_rtn 0158:0068 __abrktb 0158:00B8 __abrkp [...] 0158:034A _edata 0158:0350 __bufout 0158:0550 __bufin 0158:0750 _x 0158:0752 _y 0158:0754 __buferr 0158:0960 _end
Gotcha! _edata
is a handy tag assigned by the linker to the location where the initialized data ends, so it seems my troublemakers are named __bufout
, __bufin
and __buferr
(x
and y
are sentinel values from my code, which I wrote in Part 1). Whare did y’all come from?
ninja@dell:slibce$ for f in *.obj; do echo "==== $f"; dmpobj $f | grep bufout; done [...] ==== _file.obj 3 - '__bufout' Type 0, NEAR, Size:00000200h ==== _flsbuf.obj 5 - '__bufout' Type:0 [...]
Looks like these came from the object file _file.obj
, so let’s inspect it:
ninja@dell:slibce$ dmpobj _file.obj [...] COMDEF(b0) recnum:13, offset:000000b0h, len:002ah, chksum:f7h(f7) 2 - '__bufin' Type 0, NEAR, Size:00000200h 3 - '__bufout' Type 0, NEAR, Size:00000200h 4 - '__buferr' Type 0, NEAR, Size:00000200h [...]
Three buffers of size 0x200 make 0x600, i.e. 1536 bytes. Now I smell blood. But how come I couldn’t find any references to these? No code is using it, and without having some reference to grab onto, I will not be able to locate their equivalent locations in the actual game executable. This time, the key lies in the other object file that references this symbol, _flsbuf.obj
. I open it in IDA:
My minimal example does not have __flsbuf
. But when I add all the other libc functions that the game is using, it is also pulled in by the linker. Apparently, these three buffer areas are placed in the executable by the linker even if nothing is referencing them! Damn, those old linkers sure acted weird. I could not find references to the problematic areas because I was looking at a minimal example which had the data, but not the code using the data, which I thought was not possible. But now I can locate __bufout
and __buferr
by looking at this specific location in __flsbuf
. That still leaves __bufin
though.
ninja@dell:slibce$ for f in *.obj; do echo "==== $f"; dmpobj $f | grep bufin; done [...] ==== _file.obj 2 - '__bufin' Type 0, NEAR, Size:00000200h
This symbol is only referenced in _file.obj
, so let’s open it in IDA again:
By searching for the value of 0x84 in the executable’s initialized data section, then going back counting the bytes, I can locate the offset containing the pointer to __bufin
. Now it all comes together beautifully:
It was trivial to find the corresponding areas in the game’s executable by examining the equivalent telltale locations. I now have all the data regions introduced by libc identified and marked in the disassembly. It may not be much, but it’s always an extra piece of information and a necessary step if I ever aim for a near-perfect recreation (I’ll explain why a fully perfect one is not feasible another time). If it ever comes to that, this post will have a Part 3, but I’m pretty satisfied with what I got right now.
If that’s the good news, the bad news has got to be that I’m officially out of excuses to continue the reconstruction of a nasty game routine that ostensibly has to do with mission generation, consists of multiple nested loops and conditions, with an occasional goto
sprinkled in, and nothing it does makes sense on the surface - just reading and writing of random-looking numeric data all over the place. But I’ll figure it out. Some day.