64-Bit Windows Shellcoding!
Environment Setup
OS: Windows 10 64-Bit
Tools:
Introduction
Previously we’ve been using shellcode generated by msfvenom
to accomplish our objectives.
While that’s easy and effective, it doesn’t teach us anything about how shellcode actually works.
So this time round let’s try to write our own shellcode!
Writing good shellcode on the Windows platform is much more complicated than on Linux.
On Linux we can just syscall our way through everything, but that won’t work on Windows.
This is because
- Windows doesn’t document its syscall numbers so we have to manually figure them out in a debugger
- The syscall numbers can change on different versions of Windows, and hardcoded values will not be portable
Before getting all technical, let’s first examine the attributes of a good shellcode.
Attributes of a good shellcode
- Portable
The shellcode should be capable of resolving addresses of APIs and libraries at runtime, and no hardcoded values should be used. - Position Independent
The shellcode(assembly) should know the addresses of it’s own labels at runtime. - NULL Free
Technically isn’t required for our purpose, but more important when used in exploitation scenarios(e.g. strcpy buffer overflow).
With these in mind, we can move on to some Windows Internals required to understand shellcoding.
Windows Internals
PEB
Every process in windows has its Process Environment Block
(PEB).PEB
is a structure that stores data about the current process.
Here’s its prototype:
1 | 0:000> dt -r nt!_PEB 27d000 |
As seen above, the PEB
is a large and complex structure with lots of nested structures within it.
However, its complexity is also extremely useful when enumerating an application, as it helps us locate key datastructures.
The PEB
itself can be found at fs:[0x30]
for 32-Bit processes and gs:[0x60]
for 64-Bit processes.
PEB_LDR_DATA
The address of Ldr
(i’m not sure what it stands for… apparently “load” or something) is a member of the PEB
struct at offset 0x18
of the PEB
for 64-Bit.Ldr
has a type of PEB_LDR_DATA
, which is a struct on its own.
1 | 0:000> dt -r nt!_PEB_LDR_DATA 7ff89701a4c0 |
This struct is important because it holds 3 doubly linked lists, (InLoadOrderModuleList
and friends).
These 3 lists are of type LIST_ENTRY
, and a new node is added to the list everytime a new library is loaded by the process.
Take note of the InInitializationOrderModuleList
at offset 0x30
of the Ldr
.
struct LIST_ENTRY
might look useless on its own, but every list is part of a larger struct of type LDR_DATA_TABLE_ENTRY
.
Specifically, the InInitializationOrderModuleList
is at offset 0x20
of the LDR_DATA_TABLE_ENTRY
.
Let’s take the first Flink
of the initorder list shown above as an example.
LDR_DATA_TABLE_ENTRY
1 | 0:000> dt nt!_LDR_DATA_TABLE_ENTRY e3350-0x20 |
Ah! so this is the data entry for ntdll.dll
!
Apart from its name, we also know the load address of the module given its DllBase
member.
With this info, we can locate the base address of any modules loaded by the process.
More importantly, after locating the base address of kernel32.dll
, we can invoke LoadLibraryA
to load arbitrary modules.
Now the question is, how do we resolve the addresses of functions inside a module, given the module’s base address?
IMAGE_DOS_HEADER
At the start of every library/executable’s load address, there will be a header of type IMAGE_DOS_HEADER
.
Take kernel32.dll
as example:
1 | 0:002> lm m kernel32 |
This is the old old DOS header, that contains information such as the “MZ” magic byte for programs to recognise the executable/dll file format.
At offset 0x3c
is the relative offset to the new executable header, aka PE Header
or as Microsoft types it, the IMAGE_NT_HEADERS
.
IMAGE_NT_HEADERS
1 | 0:002> dt nt!_IMAGE_NT_HEADERS64 7ff896b00000+0n232 |
Note: On 64-Bit it’s called IMAGE_NT_HEADERS64
.
Offset 0x18
to that is the OptionalHeader
member of type IMAGE_OPTIONAL_HEADER64
.
Let’s see how that can help us.
IMAGE_OPTIONAL_HEADER64
1 | 0:002> dt nt!_IMAGE_OPTIONAL_HEADER64 7ff896b00000+0n232+0x18 |
Cool, offset 0x70
of the optional header is an array called DataDirectory
, that contains at most 16 entries of type IMAGE_DATA_DIRECTORY
.
IMAGE_DATA_DIRECTORY
1 | 0:002> dt nt!_IMAGE_DATA_DIRECTORY 7ff896b00000+0n232+0x18+0x70 |
Each IMAGE_DATA_DIRECTORY
entry is 8 bytes in total, and contains a relative address to the actual entry table in memory(from the module base address).
1 | winint.h: |
In winint.h
we can find the tables that each index represents.
Index 0, which is the first entry, contains the relative address to the Export Directory Table
of type IMAGE_EXPORT_DIRECTORY
.
IMAGE_EXPORT_DIRECTORY
WinDbg
doesn’t seem to contain information regarding this type, but you can find it on any Microsoft documentation.
1 | typedef struct _IMAGE_EXPORT_DIRECTORY |
This structure has multiple useful fields that can aid us in resolving a function’s address.AddressOfFunctions
is a pointer to an array of addresses.
Each entry in that array contains the relative address from the module’s base address to that function’s absolute address in memory.AddressOfNames
and AddressOfNameOrdinals
works the same way as AddressOfFunctions
.
Below is the procedure of resolving a function’s load address:
- Iterate through the
AddressOfNames
array to locate desired function name. Take note of its index in the array, for example index 5. - The same index(5) in the
AddressOfNameOrdinals
array will contain a new index(let’s say 15). - That new index(15) in the
AddressOfFunctions
array will contain the relative address of the function.
Let’s try that in WinDbg
to locate the absolute address of the 7th function in AddressOfNames
.
Dump DWORDs at the
Export Directory Table
:1
2
3
4
5
6
7
8
90:002> dd 7ff896b00000+9a1e0
00007ff8`96b9a1e0 00000000 1a965747 00000000 0009e1d2
00007ff8`96b9a1f0 00000001 00000661 00000661 0009a208
00007ff8`96b9a200 0009bb8c 0009d510 0009e1f7 0009e22d
00007ff8`96b9a210 00020080 0001b700 0005a140 000128f0
00007ff8`96b9a220 00025640 00025650 0009e2b3 0003cce0
00007ff8`96b9a230 0005a280 0005a2e0 00022270 0001e2c0
00007ff8`96b9a240 0003a620 000208a0 0003a640 00038a40
00007ff8`96b9a250 0009e3ec 0009e42c 00007200 00025290At offset
0x20
is the pointer toAddressOfNames
array.Dump DWORDs at the
AddressOfNames
array:1
2
3
4
5
6
7
8
90:002> dd 7ff896b00000+9bb8c
00007ff8`96b9bb8c 0009e1df 0009e218 0009e24b 0009e25a
00007ff8`96b9bb9c 0009e26f 0009e278 0009e281 0009e292
00007ff8`96b9bbac 0009e2a3 0009e2e8 0009e30e 0009e32d
00007ff8`96b9bbbc 0009e34c 0009e359 0009e36c 0009e384
00007ff8`96b9bbcc 0009e39f 0009e3b4 0009e3d1 0009e410
00007ff8`96b9bbdc 0009e451 0009e464 0009e471 0009e48b
00007ff8`96b9bbec 0009e4a9 0009e4e0 0009e525 0009e570
00007ff8`96b9bbfc 0009e5cb 0009e620 0009e673 0009e6c8Each entry contains the relative address to the actual name of the function.
Get name of the 7th function indexed(so index 6):
1
20:000> da 7ff896b00000+9e281
00007ff8`96b9e281 "AddConsoleAliasA"Get its ordinal index:
1
2
3
4
5
6
7
8
90:002> dd 7ff896b00000+9d510
00007ff8`96b9d510 00010000 00030002 00050004 00070006
00007ff8`96b9d520 00090008 000b000a 000d000c 000f000e
00007ff8`96b9d530 00110010 00130012 00150014 00170016
00007ff8`96b9d540 00190018 001b001a 001d001c 001f001e
00007ff8`96b9d550 00210020 00230022 00250024 00270026
00007ff8`96b9d560 00290028 002b002a 002d002c 002f002e
00007ff8`96b9d570 00310030 00330032 00350034 00370036
00007ff8`96b9d580 00390038 003b003a 003d003c 003f003eEach entry is a WORD, so our index will still be 6.
Get its address:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
190:000> dd 7ff896b00000+9a208
00007ff8`96b9a208 0009e1f7 0009e22d 00020080 0001b700
00007ff8`96b9a218 0005a140 000128f0 00025640 00025650
00007ff8`96b9a228 0009e2b3 0003cce0 0005a280 0005a2e0
00007ff8`96b9a238 00022270 0001e2c0 0003a620 000208a0
00007ff8`96b9a248 0003a640 00038a40 0009e3ec 0009e42c
00007ff8`96b9a258 00007200 00025290 0003a680 0003a660
00007ff8`96b9a268 0009e4bf 0009e4fd 0009e545 0009e598
00007ff8`96b9a278 0009e5f0 0009e644 0009e698 0009e6e3
0:000> u 7ff896b00000+25640
KERNEL32!AddConsoleAliasA:
00007ff8`96b25640 ff2552c70500 jmp qword ptr [KERNEL32!_imp_AddConsoleAliasA (00007ff8`96b81d98)]
00007ff8`96b25646 cc int 3
00007ff8`96b25647 cc int 3
00007ff8`96b25648 cc int 3
00007ff8`96b25649 cc int 3
00007ff8`96b2564a cc int 3
00007ff8`96b2564b cc int 3
00007ff8`96b2564c cc int 3Unassembling in
WinDbg
confirms that we have successfully resolved the function.
Summary
To summarise:
- Resolve base address of desired module using the
PEB
,Ldr
and list structures. - Locate that module’s
Export Directory Table
using its headers. - Use the three arrays to resolve the function’s absolute address.
Now let’s start actually writing the shellcode.
Shellcoding
loader.c
1 |
|
I’ll use this to easily debug the shellcode in WinDbg
.
Resolve kernel32.dll Base
1 | __asm__ |
Instead of performing a sub rsp, 0x7e0
to make space on the stack, we can add a huge number to overflow the 64-Bit number and achieve the same results.
This is to prevent null bytes that will arise when 0x7e0 is promoted to a 64-Bit number to be operated on a 64-Bit register.
Likewise we can use sub sp, 0x7e0
.
In parse_next_module
, we repeatedly check if the 25th and 26th byte of the module name are null.
This is because we want to find kernel32.dll
, which is 13 bytes including the terminating null.
However because the name is in unicode representation, each byte now becomes a word.
Hence we are checking for the terminating null at the end of the name.
(I think this is slightly sloppy, but oh well it works)
Let’s check the results in WinDbg. The base address should be stored in rbx
1 | Breakpoint 0 hit |
Success!
Compute Hash
Instead of loading function names as strings and comparing them in assembly to locate desired functions, it will be much easier if we hash the function names into 64-Bit numbers.
1 | "compute_hash:;" |
I’ll use a simple algorithm that adds a byte from the string to a sum, then rotate the sum rightwards by 13 bits.
This small routine will be used by our shellcode to hash the names of functions in the AddressOfNames
array.
Of course, we will need to calculate the hash for our desired function beforehand, so we can hardcode that for comparison.
compute_hash.c
1 |
|
Now we can move on to code the actual function lookup routine.
Find Function
1 | "find_function_jmp:;" |
The first few lines are responsible for achieving position independence.
By using a call
instruction, the address of the first instruction of find_function
is stored on the stack.
We can store it in r13
, so our future calls to r13
will be NULL free.
The jmp
before call
is to ensure that no NULL bytes exist in our first call, since we are calling a negative offset.
Afterwards, find_function
will retrieve the desired function address if found and return it.
Since we are writing our own assembly, we don’t have to care about calling conventions and I just chose to store the argument(hash) in r15
.
Invoke Function
1 | "resolve_symbols:;" |
The final part is calling any desired function.
This is slightly tricky, because some functions have weird arguments(like empty structures) and we have to adjust the stack accordingly.
Since we used pure assembly, it is likely that we messed something up along the way and might cause a segfault in the API functions.
We’ll have to deal with that on a case by case basis with the help of WinDbg
.
An example will be the WinExec
call above. WinExec
expects the stack to be 16 bytes aligned, or else one of its internal xmm
related instruction will fail.
This post is getting long so I’m gonna stop.
Anyway, with some knowledge of windows internals and assembly we can make our own shellcode yay