Microsoft Longhorn

DWM & Graphics Fixing DCE in Longhorn build 3706

DCE (Desktop Composition Engine) is the component of Longhorn responsible for rendering the desktop and all windows using the GPU (unlike previous versions that relied on the CPU). In Longhorn build 3706, DCE was broken, causing BSODs when attempting to activate it on real hardware, though it did work in certain versions of VMWare.

Previous fix attempts

I had made a previous attempt to fix the issue in June 2024, and ended up with a one-byte patch that avoided the code that caused the crash, but caused graphical issues while it was running:

I will discuss the details of this patch as I discuss the new patch, since the new patch is a more complete fix that also avoids the crash, but without the graphical issues.

I’d been experimenting with AI-assisted reverse engineering in general lately, inspired by serveral projects that had used it to reverse engineer and reimplement old games. I thought it would be interesting to see if I could use it to reverse engineer the DCE code and figure out what was going wrong, and if we could come up with a better patch than the one I had previously. I started off using DeepSeek Pro v4, before switching to GPT-5.4 via Github Copilot. I provided the models with the crash details, the decompiled win32k code, the original win32k.sys binary, and some comments from my own assessment when I did the previous attempt at a patch.

The Crash

From the previous patch attempt, I had saved some details from having WinDbg attached to the system when the crash occurred. The crash is a KERNEL_MODE_EXCEPTION_NOT_HANDLED, which means that an exception was triggered in kernel mode and was not handled by any exception handler, leading to a bug check (BSOD), and the exception code is 0xc0000005, which is an access violation, meaning that the code attempted to read or write to an invalid memory address.

KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
Arg1: c0000005, The exception code that was not handled
Arg2: bfaadc76, The address that the exception occurred at
Arg3: b9e67a90, Trap Frame
Arg4: 00000000

STACK_TEXT:
WARNING: Stack unwind information not available. Following frames may be wrong.
b9e67664 805062b3     0000008e c0000005 bfaadc76 nt!KeBugCheckEx+0x19
b9e67a20 8054653e     b9e67a3c 00000000 b9e67a90 nt!KeTerminateThread+0xd87
b9e67aa0 bfb7d526     e1a10a80 e1a10f34 e1a10f74 nt!Kei386EoiHelper+0x1ce
b9e67b54 bf92f455     b9e67c30 00000001 00000000 nv4_disp+0x1cd526
b9e67b90 bff812f2     b9e67c30 e2678008 e23c10e8 win32k!XLATEOBJ_iXlate+0x2e27b
b9e67c6c bf919252     e23c10e8 e23736d8 e23bbef0 dxg+0x12f2
00000000 00000000     00000000 00000000 00000000 win32k!XLATEOBJ_iXlate+0x18078

The symbol shown by WinDbg is misleading here, since we don’t have symbols available for this build. The relevant win32k code is not actually XLATEOBJ_iXlate, this is simply the preceeding public symbol. The return address maps into the DCEComposeFrame subroutine (name inferred from debugging strings in calling code).

What this tells us:

  • The fault is ultimately triggered inside the video driver’s callback path.
  • That does not mean the driver is necessarily at fault - indeed, because it happens across different drivers from different vendors, it seems likely that it’s a win32k bug.
  • This suggests that win32k passes invalid data into the driver callback.

The subroutine that matters: DCEComposeFrame

The important subroutine is DCEComposeFrame in win32k.sys. GPT-5.4 was produced the following pseudocode for this subroutine, inferring the names of variables and sub-objects from the behavior of the code.

// Names below are inferred from behavior, not symbols.
// Return value behaves like an HRESULT / NTSTATUS-style signed status.

int DCEComposeFrame(DceComposition *composition)
{
    int status;
    void *targetLockHandle = NULL;
    unsigned char targetCookie[0x34];

    // composition->deviceContextLike points at an object whose +0x40 area
    // contains the sprite list head used later.
    DceSpriteNode *sprite = composition->deviceContextLike->spriteListHead;

    // composition->composeState is a small vtable-driven helper object used to
    // set up the actual full-frame draw state before calling into the driver.
    ComposeState *state = composition->composeState;

    // Step 1: ask the DCE backend / PDEV helper to open or prepare
    // the frame target. In the binary this is [pdev+2E4].
    status = pdev->OpenComposeTarget(
        composition->targetObject,
        targetCookie,
        &targetLockHandle
    );

    if (status < 0) {
        // Failure here is a normal skip path.
        // win32k still continues to sprite composition and cleanup.
        goto SpritePass;
    }

    // Step 2: initialize the compose-state object.
    state->Begin();

    // Step 3: select which underlying composition ordinal / surface slot is
    // active. This was initially suspected to be the bug, but forcing
    // either side did not help.
    state->SelectSurface(
        composition->surfaceOrdinals[0x26B - composition->surfaceToggle],
        0
    );

    // Step 4: program a handful of fixed render-state values into the
    // compose state.
    state->SetState(0x89, 0);
    state->SetState(0x13, 5);
    state->SetState(0x14, 6);
    state->SetState(0x1B, 1);
    state->SetState(0x16, 1);

    // Step 5: set up a transform / clear-like state block.
    state->SetTransformOrViewport(
        composition->frameParam10,
        composition->frameParam14,
        1,
        0xFF000000
    );

    // Step 6: perform the actual full-frame compose into the destination.
    // In the binary this is [pdev+2F4].
    // This is the path that crashes on real hardware.
    status = pdev->ComposeFrame(
        state,                      // pushed last, first logical argument
        composition->targetSurface, // [composition+3Ch]
        targetLockHandle,           // returned by OpenComposeTarget
        NULL,                       // special frame-path argument A
        0x1C,
        0x144,
        NULL                        // special frame-path argument B
    );

SpritePass:
    // Even if the full-frame compose failed or was skipped, win32k still
    // walks the sprite list and tries to composite sprite/overlay content.
    while (sprite != NULL) {
        if (status < 0) {
            break;
        }

        status = DCEComposeSprite(
            composition,
            sprite,
            state,
            targetLockHandle
        );

        sprite = sprite->next;
    }

    // Step 7: close / release the frame target.
    // In the binary this is [pdev+2EC].
    pdev->CloseComposeTarget(targetCookie, targetLockHandle);

    return status;
}

Back to the previous attempt

The earlier attempt made the following change:

test    eax, eax
jl      loc_BF91926B

to:

test    eax, eax
jno     loc_BF91926B

That effectively makes the branch unconditional there, because test clears the overflow flag.

This has the effect of changing the status check after the OpenComposeTarget call, so that it always skips the full-frame compose and goes straight to the sprite composition pass. This avoids the crash, but causes graphical issues because the full-frame compose is not happening.

That explains the observed behavior exactly:

  • full-frame composition is skipped entirely
  • only sprite/overlay composition still runs
  • some moving content appears
  • the base frame is black or stale

That confirmed that the crashing call was in the full-frame compose path, but it was not a viable workaround.

The key clue: DCEComposeSprite uses the same callback successfully

The decisive clue came from comparing DCEComposeFrame with DCEComposeSprite.

DCEComposeSprite also calls the same callback slot, pdev->ComposeFrame / [pdev+2F4], but it does not call it with null command data.

In C-like terms, the sprite path looks more like this:

int DCEComposeSprite(
    DceComposition *composition,
    DceSprite *sprite,
    ComposeState *state,
    void *targetLockHandle)
{
    PrimitiveBuffer *primitiveBuffer;
    int primitiveCount;

    primitiveBuffer = BuildSpritePrimitiveBuffer(sprite);
    primitiveCount = ComputePrimitiveCount(sprite);

    ProgramSpriteRenderState(state, sprite);

    return pdev->ComposeFrame(
        state,
        composition->targetSurface,
        targetLockHandle,
        primitiveBuffer,
        0x1C,
        0x144,
        primitiveCount
    );
}

The important difference is this:

  • sprite path passes a real primitive buffer and a nonzero count
  • frame path passes two null-ish special arguments

That means the driver callback itself is not inherently unusable. It already works in the sprite path. Clearly the VMWare SVGA emulated hardware or its driver is more tolerant of the null/zero arguments here than real hardware and drivers are, which is why the crash only happens on real hardware.

So to get a working fix, we need to stop win32k.sys from passing a null buffer pointer into that callback

The fix

There was not enough inline space to rewrite the frame callback argument setup directly, so we need to use what’s known as a “code cave” - an unused area of code or data in the binary that we can repurpose to store some custom code, and then jump to it from the original code.

The working binary patch does this:

  1. expands .text virtual size so end-of-section slack is mapped
  2. replaces the inline full-frame callback sequence with a trampoline call
  3. the trampoline rebuilds the original callback argument block
  4. the trampoline changes the pointer to the primitive buffer to point to a dummy buffer in the new slack space.
  5. the callback is then invoked exactly as before

So the patched logic is effectively:

status = pdev->ComposeFrame(
    state,
    composition->targetSurface,
    targetLockHandle,
    dummyMappedBuffer,
    0x1C,
    0x144,
    NULL
);

This preserves DCE and preserves the original frame path. It only stops the driver from receiving a null pointer in the position that appears to be unsafe on real hardware.

The one additional step is to then fix the PE checksum, which is required for win32k.sys to load in Longhorn builds.

With this patch, the DCE now works roughly in line with how you would expect it to work on a build of this era. It’s not perfect - there are still some graphical glitches, but the desktop is fully composited and usable, and there are no BSODs.

3706 DCE Fix Patch

Thoughts

While AI is increasingly used in other hobbyist reverse engineering projects, I’m not sure this has been done before within the Windows beta communtiy, and I hope that this encourages other people to experiment with it. I picked this project precisely because I already had some familiarity with the issue, but I didn’t necessarily think a fix would be doable. It did take some time investment and trial and error, but I think the end result is satisfactory and probably significantly, it would probably have been beyond my skill to arrive at this patch entirely on my own.

DeepSeek ended up not returning a useful answer for me, but I wouldn’t necessarily hold that against it. It was my first attempt and in hindsight, I probably didn’t provide it with the right information or prompts to get a useful answer. Conversely, I took everything that I’d learned from that DeepSeek attempt and used it to give Copilot / GPT 5.4 a much better starting point, with much more context and useful information.