With that, I decided to have a quick game of Mini Golf, since that's just about the only game I have for Windows 3.1 that actually plays any sound! I've had a copy of this game forever, but this would be a "new" way of playing it, of sorts! The game is from DGS Software, and appears to be a version of this, although the word "Twisted" doesn't appear anywhere in my copy, and mine has more courses than mentioned on that page…
Anyway, when I fired up the game, sure enough, I heard the jingle playing on the DGS logo, and I was able to start playing a course. However, after a few putts, I got a message about a General Protection fault (#GP), with "Close" and "Ignore" options (but the latter just made it pop up again). My first thought was that this was a bug in the HD Audio driver itself – perhaps it wasn't holding up as well on my hardware as I thought. After all, as I mentioned, the driver is rather choosy about what sort of environment it can run in. Still, the issue with Mini Golf seemed pretty systematic, and I decided it merited further investigation.
The Investigation
I downloaded the latest Win16 build (yes, they are still published regularly!) of OpenWatcom V2, and installed it under Windows 3.1. Then, having booted natively with the HDA driver in place, I used the OpenWatcom debugger to run GOLF.EXE. Again, I started playing, and again, I got a #GP after a short while, but this time it was trapped by the debugger!
So, what was happening? Firstly, the fault was inside GOLF.EXE, so it was nothing to do with the sound driver per se, which was encouraging. This was the offending instruction:
2:6c98: c5 4e 0a lds cx, [bp+0Ah]
So, it's loading a far pointer into DS:CX. Windows is in Standard Mode, which means the CPU is in Protected Mode, so this can indeed cause a #GP if the segment being loaded into the DS register isn't a valid selector in the GDT/LDT, or if the program doesn't have the privilege to access it.
Right then, what far pointer was on the stack at SS:[BP+0Ah]?
FFFF:FFFF
Oh dear. While FFFF could be a Ring-3 LDT selector (if the OS had set the LDT limit to the maximum possible), this definitely does not look like a valid pointer!
I looked at the next few lines to see what it tries to do with this pointer:
2:6c9b: 8c da mov dx, ds
2:6c9d: 8b c2 mov ax, dx
2:6c9f: 0b c1 or ax, cx
2:6ca1: 74 39 jz 6cdc
So it immediately moves the segment of this far pointer into the general-purpose register DX, copies it into AX, and or-s it with the offset in CX. If the segment and offset are both zero, the or sets the zero flag, and in that case, it jumps to the end of the function. Realizing this, I manually used the debugger to change the FFFF:FFFF on the stack to 0000:0000, and restarted execution. Sure enough, the game proceeded, albeit with some minor graphical glitches due to the interruption!
However, it wasn't long before another #GP occurred, at the same instruction and for the same reason. I could keep manually changing the Fs to 0s, but this was really no way to play the game! I needed to dig in and understand how this crazy pointer was being passed to this function.
I went up the stack to get to the calling function, and see where the "pointer" was coming from. The progression included the following:
2:7271: 2b 46 fa sub ax, [bp-06h]
2:7274: 1b 56 fc sbb dx, [bp-04h]
2:7277: 89 46 e8 mov [bp-18h], ax
2:727a: 89 56 ea mov [bp-16h], dx
...
2:72ec: 8b 46 e8 mov ax, [bp-18h]
2:72ef: 8b 56 ea mov dx, [bp-16h]
2:72f2: 89 46 e4 mov [bp-1Ch], ax
2:72f5: 89 56 e6 mov [bp-1Ah], dx
...
2:7304: ff 76 e6 push word ptr [bp-1Ah]
2:7307: ff 76 e4 push word ptr [bp-1Ch]
2:730a: ff 76 ee push word ptr [bp-12h]
2:730d: ff 76 ec push word ptr [bp-14h]
2:7310: 9a 6a 6c 0a 72 call 2:6c6a
So the "pointer" makes its way from DX:AX at 2:7274, to SS:[BP-18h] at 2:727a, to SS:[BP-1Ch] at 2:72f5, and eventually onto the stack frame for the called function at 2:7307. Now, an immediate red flag is the use of the sub and sbb instructions! These are "subtract" and "subtract with borrow" respectively, and imply that the contents of DX:AX are being treated as a straightforward 32-bit integer, and not as a far pointer!
Going a bit further back, here's where this 32-bit integer is coming from:
2:7225: c7 46 f0 04 00 mov word ptr [bp-10h], 0004h
2:722a: c4 5e 06 les bx, [bp+06h]
2:722d: 26 ff 77 06 push word ptr es:[bx+06h]
2:7231: 8d 46 f0 lea ax, [bp-10h]
2:7234: 8c d2 mov dx, ss
2:7236: 52 push dx
2:7237: 50 push ax
2:7238: 6a 08 push word 0008h
2:723a: 9a ff ff 00 00 call MMSYSTEM.412 ; <WAVEOUTGETPOSITION>
2:723f: 8b 46 f2 mov ax, [bp-0Eh]
2:7242: 8b 56 f4 mov dx, [bp-0Ch]
2:7245: 89 46 fa mov [bp-06h], ax
2:7248: 89 56 fc mov [bp-04h], dx
...
2:7262: c4 5e 06 les bx, [bp+06h]
2:7265: 26 c4 5f 08 les bx, es:[bx+08h]
2:7269: 26 8b 47 0c mov ax, es:[bx+0Ch]
2:726d: 26 8b 57 0e mov dx, es:[bx+0Eh]
2:7271: 2b 46 fa sub ax, [bp-06h]
2:7274: 1b 56 fc sbb dx, [bp-04h]
It all seems to start with a call to WAVEOUTGETPOSITION, with a stack frame looking like this:
SS:[SP] cbmmt == 8
SS:[SP+2] pmmt == SS:[BP-10h]
SS:[SP+6] hwo == word ptr ES:[BX+06h]
Which means that there is an MMTIME structure at SS:[BP-10h], of size 8 bytes, looking something like this :
SS:[BP-10h] wType == 4 == TIME_BYTES
SS:[BP-0Eh] cb == ? (to be filled in by the function call)
SS:[BP-0Ah] (2 padding bytes)
There is also a far pointer at SS:[BP+06h] to some other type of structure. This structure has a HWAVEOUT at offset 6 (hence the use of ES:[BX+06h] after loading the far pointer into ES:BX).
Immediately after the function call, from 2:723f to 2:7248, the cb value in the MMTIME (which has just been filled in by the function) is loaded into DX:AX, and then stored at SS:[BP-06h]. This value is a 32-bit integer representing the number of bytes traversed by the wave device while playing a sound.
Later, at 2:7262, the far pointer at SS:[BP+06h] is again loaded into ES:BX, and from there, another far pointer to some structure is loaded from ES:[BX+08h]. This latter structure has another 32-bit integer at offset 12 (0Ch), which is loaded into DX:AX at 2:7269 and 2:726d. It is from this integer, presumably representing the total number of bytes in the current sound sample, that the cb value is subtracted.
I used a breakpoint to step through this code path a few times, as I played the game. It seems that WAVEOUTGETPOSITION was always getting called when the sample was completely finished, so this subtraction should have been returning a harmless zero. However, about half the time, the function call was reporting cb equal to one byte more than there actually were in the sample! (Presumably, this was due to a lack of sanity checking in the HD Audio driver, which explains why these #GP faults were only happening with that driver.) As a result, the subtraction was returning -1, or, as a 32-bit integer, FFFFFFFFh!
So now, let's return to the offending sequence of instructions, and include a few more to make things really clear:
2:6c98: c5 4e 0a lds cx, [bp+0Ah]
2:6c9b: 8c da mov dx, ds
2:6c9d: 8b c2 mov ax, dx
2:6c9f: 0b c1 or ax, cx
2:6ca1: 74 39 jz 6cdc
2:6ca3: c5 76 f6 lds si, [bp-0Ah]
2:6ca6: c4 7e fa les di, [bp-06h]
Right, this use of the lds instruction is clearly not sane. The value at SS:[BP+0Ah] is a 32-bit integer, not a far pointer. As such, the upper half is not actually used as a segment – note that it is immediately moved into a general purpose register (DX), and the selector in DS is almost immediately replaced, via another unrelated use of the lds instruction, at 2:6ca3.
This abuse of lds seems to have been a bit of code golf (!) played by someone writing this routine directly in assembly. It would work fine in Real or Virtual 8086 Mode (e.g. in DOS), because any 16-bit integer is acceptable as a segment value. It also usually "works" in Protected Mode, because the samples are short enough that the upper half of the 32-bit integer tends to be zero (which is always valid as a "null" segment selector).
Anyway, after this badly-coded check for zero, what does this routine do if the value is not zero? Well, apparently, it tries to fill in all the bytes in the sound sample buffer that have not been played yet, with some new value. If it overflows a segment while doing this, it tries to move to the next segment:
2:6cb8: 83 c6 01 add si, 01h
2:6cbb: 75 07 jnz 6cc4
2:6cbd: 8c d8 mov ax, ds
2:6cbf: 05 00 10 add ax, 1000h
2:6cc2: 8e d8 mov ds, ax
2:6cc4: 83 c7 01 add di, 01h
2:6cc7: 75 07 jnz 6cd0
2:6cc9: 8c c0 mov ax, es
2:6ccb: 05 00 10 add ax, 1000h
2:6cce: 8e c0 mov es, ax
2:6cd0: e2 d8 loop 6caa
It's copying stuff from DS:SI to ES:DI, and if SI or DI wraps around to zero on increment, it adds 1000h to DS or ES respectively. Again, this would work fine in Real or Virtual 8086 Mode, but not in Protected Mode! Adding 1000h to a Real-Mode segment advances its base by 10000h, compensating for overflow in a 16-bit offset, but adding 1000h to a Protected-Mode selector is undefined (unless you know exactly what's in the Descriptor Tables, which this code does not)!
Again, under normal circumstances, this shouldn't even come up, since the samples in question are so short, but this kind of code simply should not be written when the application can be run in Protected Mode! It's strange, because I found things like this in other parts of the program:
3:08ea: 8c c1 mov cx, es
3:08ec: 81 c1 ff ff add cx, KERNEL.114 ; <__AHINCR>
3:08f0: 8e c1 mov es, cx
This block indicates an awareness in the program at large that running under Protected-Mode Windows is a possibility. Presumably, this sane block was compiler-generated, while, as I said before, the offending function was hand-coded in assembly, and probably copied from some DOS program that always runs in Real or Virtual 8086 Mode…
Anyway, clearly, they never imagined that there would be a negative argument passed to this function. It would run completely wild trying to fill in FFFFh bytes in a buffer and possibly overwrite some critical data structure. Or, in Protected Mode, it would just cause another #GP after running off the end of the buffer segment!