Symbolic Execution 3

2022-12-29

Testing KLEE against a variety of softwares

Introduction

After going through all the theory in the previous post, let’s see how KLEE performs against normal toy programs, a buggy toy program, a toy password checker and an open source excel library.

KLEE Setup

Download KLEE as a docker image:

1	docker pull klee/klee

Create a persistent container using the downloaded image:

1	docker run -v "C:\Users\User\Desktop\hacking\klee":/host -ti --name=klee_261222 --ulimit="stack=-1:-1" klee/klee

Notice we didn’t use --rm so the container will not be destroyed when we exit it from it and we also gave the container a name using the --name flag.

Note the --ulimit option sets an unlimited stack size inside the container. This is to avoid stack overflow issues when running KLEE.

The -v option mounts the host folder into the docker container at /host, so we can access source files to audit with KLEE.

If this worked correctly your shell prompt will have changed and you will be the klee user.

To exit, just use the exit command.

List all containers with

1	docker ps -a

We can always start the container again with

1	docker start -ai klee_261222

To delete the container:

1	docker rm klee_261222

Test1

#include <klee/klee.h>

/*
 * First KLEE tutorial: testing a small function
 * http://klee.github.io/tutorials/testing-function/
 */


int get_sign(int x) {
	if (x == 0)
		return 0;

	if (x < 0)
		return -1;
	else
		return 1;
}

int main(void) {
	int a;
	klee_make_symbolic(&a, sizeof(a), "a");
	return get_sign(a);
}

In this small c program we are testing the get_sign function, which has 3 possible paths.

In order to test this function with KLEE, we need to run it on symbolic input.
To mark a variable as symbolic, we use the klee_make_symbolic() function (defined in klee/klee.h), which takes three arguments:

the address of the variable (memory location) that we want to treat as symbolic
the size of the variable
a name (can be anything)

Now we need to compile this code into LLVM bitcode.

1	clang -I /home/klee/klee_src/include -emit-llvm -c -g -O0 -Xclang -disable-O0-optnone test1.c

The -I argument is used so that the compiler can find klee/klee.h, which contains definitions for the intrinsic functions used to interact with the KLEE virtual machine, such as klee_make_symbolic.

It is useful to build with -g to add debug information to the bitcode file, which we use to generate source line level statistics information.

All optimizations are disbaled with -O0 -Xclang -disable-O0-optnone, and we only compile and don’t link with -c.

Now we see test1.bc

1 2	klee@9523002f7332:/host/test1$ ls test1.bc test1.c

Note some security checks by KLEE requires clang to use certain compilation flags.

For example the -fsanitize=signed-integer-overflow flag is required to detect signed integer overflow. These clang options instrument program.bc with overflow checks that are used by KLEE.

Using llvm-dis we can check out the underlying bitcode

klee@9523002f7332:/host/test1$ llvm-dis test1.bc 
klee@9523002f7332:/host/test1$ ls
test1.bc  test1.c  test1.ll
klee@9523002f7332:/host/test1$ cat test1.ll 
; ModuleID = 'test1.bc'
source_filename = "test1.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [2 x i8] c"a\00", align 1

; Function Attrs: noinline nounwind uwtable
define dso_local i32 @get_sign(i32 %x) #0 !dbg !7 {
entry:
  %retval = alloca i32, align 4
  %x.addr = alloca i32, align 4
  store i32 %x, i32* %x.addr, align 4
  call void @llvm.dbg.declare(metadata i32* %x.addr, metadata !11, metadata !DIExpression()), !dbg !12
  %0 = load i32, i32* %x.addr, align 4, !dbg !13
  %cmp = icmp eq i32 %0, 0, !dbg !15
  br i1 %cmp, label %if.then, label %if.end, !dbg !16

if.then:                                          ; preds = %entry
  store i32 0, i32* %retval, align 4, !dbg !17
  br label %return, !dbg !17

if.end:                                           ; preds = %entry
  %1 = load i32, i32* %x.addr, align 4, !dbg !18
  %cmp1 = icmp slt i32 %1, 0, !dbg !20
  br i1 %cmp1, label %if.then2, label %if.else, !dbg !21

if.then2:                                         ; preds = %if.end
  store i32 -1, i32* %retval, align 4, !dbg !22
  br label %return, !dbg !22

if.else:                                          ; preds = %if.end
  store i32 1, i32* %retval, align 4, !dbg !23
  br label %return, !dbg !23

return:                                           ; preds = %if.else, %if.then2, %if.then
  %2 = load i32, i32* %retval, align 4, !dbg !24
  ret i32 %2, !dbg !24
}

---more---

KLEE will work with this IR code to perform symbolic execution.

Now we can run KLEE:

klee@9523002f7332:/host/test1$ klee test1.bc 
KLEE: output directory is "/host/test1/klee-out-0"
KLEE: Using STP solver backend

KLEE: done: total instructions = 33
KLEE: done: completed paths = 3
KLEE: done: partially completed paths = 0
KLEE: done: generated tests = 3
klee@9523002f7332:/host/test1$

And we see it detects 3 paths in less then a second.

klee@9523002f7332:/host/test1$ cd klee-out-0/
klee@9523002f7332:/host/test1/klee-out-0$ ls
assembly.ll  info  messages.txt  run.istats  run.stats  test000001.ktest  test000002.ktest  test000003.ktest  warnings.txt
klee@9523002f7332:/host/test1/klee-out-0$

KLEE returns 3 test case files, which corresponds to each path of the program KLEE explored.
These files are in binary format, and can be parsed with ktest-tool.

For information regarding the other files that KLEE returns, check out https://klee.github.io/docs/files/

klee@9523002f7332:/host/test1/klee-out-0$ ktest-tool  test000001.ktest 
ktest file : 'test000001.ktest'
args       : ['test1.bc']
num objects: 1
object 0: name: 'a'
object 0: size: 4
object 0: data: b'\x00\x00\x00\x00'
object 0: hex : 0x00000000
object 0: int : 0
object 0: uint: 0
object 0: text: ....
klee@9523002f7332:/host/test1/klee-out-0$ ktest-tool  test000002.ktest 
ktest file : 'test000002.ktest'
args       : ['test1.bc']
num objects: 1
object 0: name: 'a'
object 0: size: 4
object 0: data: b'\x01\x01\x01\x01'
object 0: hex : 0x01010101
object 0: int : 16843009
object 0: uint: 16843009
object 0: text: ....
klee@9523002f7332:/host/test1/klee-out-0$ ktest-tool  test000003.ktest 
ktest file : 'test000003.ktest'
args       : ['test1.bc']
num objects: 1
object 0: name: 'a'
object 0: size: 4
object 0: data: b'\x00\x00\x00\x80'
object 0: hex : 0x00000080
object 0: int : -2147483648
object 0: uint: 2147483648
object 0: text: ....
klee@9523002f7332:/host/test1/klee-out-0$

As we have expected, KLEE returns concrete values to exercise all 3 paths.

0, above 0, and less than 0.

Amazing! Now let’s move on to a slightly more interesting program.

Test2

#include <klee/klee.h>

/*
 * Simple regular expression matching.
 *
 * From:
 *   The Practice of Programming
 *   Brian W. Kernighan, Rob Pike
 * 
 * http://klee.github.io/tutorials/testing-regex/
 *
 */


static int matchhere(char*,char*);

static int matchstar(int c, char *re, char *text) {
	do {
		if (matchhere(re, text))
			return 1;
	} while (*text != '\0' && (*text++ == c || c== '.'));
	return 0;
}

static int matchhere(char *re, char *text) {
	if (re[0] == '\0')
		return 0;
	if (re[1] == '*')
		return matchstar(re[0], re+2, text);
	if (re[0] == '$' && re[1]=='\0')
		return *text == '\0';
	if (*text!='\0' && (re[0]=='.' || re[0]==*text))
		return matchhere(re+1, text+1);
	return 0;
}

int match(char *re, char *text) {
	if (re[0] == '^')
		return matchhere(re+1, text);
	do {
		if (matchhere(re, text))
			return 1;
	} while (*text++ != '\0');
	return 0;
}

/*
 * Harness for testing with KLEE.
 */

// The size of the buffer to test with.
#define SIZE 7

int main() {
	// The input regular expression.
	char re[SIZE];

	// Make the input symbolic.
	klee_make_symbolic(re, sizeof re, "re");

	// Try to match against a constant string "hello".
	match(re, "hello");

	return 0;
}

The above is a small regex parser that supports ^, *, . and $.

For example, .*$ will match any text input.

This program is much more complicated than the previous, and let’s see how KLEE does.

klee@9523002f7332:/host/test2$ clang -I /home/klee/klee_src/include -emit-llvm -c -g -O0 -Xclang -disable-O0-optnone test2.c
klee@9523002f7332:/host/test2$ klee test2.bc
KLEE: output directory is "/host/test2/klee-out-0"
KLEE: Using STP solver backend
KLEE: ERROR: test2.c:26: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: test2.c:28: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location

KLEE: done: total instructions = 4848112
KLEE: done: completed paths = 6675
KLEE: done: partially completed paths = 763
KLEE: done: generated tests = 6677
klee@9523002f7332:/host/test2$

In roughly 10 seconds, KLEE completed the assessment, completing 6675 paths and finding 2 memory errors.

That’s a lot of test files to store.
We can limit the test generation(concretization) to states that actually covered new code with --only-output-states-covering-new as argument to KLEE.

If we had done that, only 16 test cases would have been generated. (We will use this for all following tests)

Note that many realistic programs have an infinite (or extremely large) number of paths through them, and it is common that KLEE will not terminate.

By default KLEE will run until the user presses Control-C (i.e. klee gets a SIGINT), but there are additional options to limit KLEE’s runtime and memory usage:

max-time=<time span>: Halt execution after the given amount of time, e.g. 10min or 1h5s.
max-forks=N: Stop forking after N symbolic branches, and run the remaining paths to termination.
max-memory=N: Try to limit memory consumption to N megabytes.

When KLEE detects an error in the program being executed it will generate a test case which exhibits the error, and write some additional information about the error into a file testN.TYPE.err, where N is the test case number, and TYPE identifies the kind of error.

For all program errors, KLEE will write a simple backtrace into the .err file.

klee@9523002f7332:/host/test2/klee-out-0$ ls *.err
test000022.ptr.err  test000023.ptr.err
klee@9523002f7332:/host/test2/klee-out-0$ cat test000022.ptr.err 
Error: memory error: out of bound pointer
File: test2.c
Line: 26
assembly.ll line: 78
State: 38
Stack: 
        #000000078 in matchhere(re=94581715805575, text=94581715805504) at test2.c:26
        #100000206 in matchstar(c=symbolic, re=94581715805575, text=94581715805504) at test2.c:19
        #200000103 in matchhere(re=94581715805573, text=94581715805504) at test2.c:29
        #300000206 in matchstar(c=symbolic, re=94581715805573, text=94581715805504) at test2.c:19
        #400000103 in matchhere(re=94581715805571, text=94581715805504) at test2.c:29
        #500000206 in matchstar(c=symbolic, re=94581715805571, text=94581715805504) at test2.c:19
        #600000103 in matchhere(re=94581715805569, text=94581715805504) at test2.c:29
        #700000030 in match(re=94581715805568, text=94581715805504) at test2.c:39
        #800000182 in main() at test2.c:62
Info: 
        address: 94581715805575
        next: object at 22978515913408 of size 1536
                MO9[1536] (no allocation info)
klee@9523002f7332:/host/test2/klee-out-0$ cat test000023.ptr.err 
Error: memory error: out of bound pointer
File: test2.c
Line: 28
assembly.ll line: 90
State: 110
Stack: 
        #000000090 in matchhere(re=94581715805574, text=94581715805504) at test2.c:28
        #100000206 in matchstar(c=symbolic, re=94581715805574, text=94581715805504) at test2.c:19
        #200000103 in matchhere(re=94581715805572, text=94581715805504) at test2.c:29
        #300000206 in matchstar(c=symbolic, re=94581715805572, text=94581715805504) at test2.c:19
        #400000103 in matchhere(re=94581715805570, text=94581715805504) at test2.c:29
        #500000206 in matchstar(c=symbolic, re=94581715805570, text=94581715805504) at test2.c:19
        #600000103 in matchhere(re=94581715805568, text=94581715805504) at test2.c:29
        #700000037 in match(re=94581715805568, text=94581715805504) at test2.c:41
        #800000182 in main() at test2.c:62
Info: 
        address: 94581715805575
        next: object at 22978515913408 of size 1536
                MO9[1536] (no allocation info)
klee@9523002f7332:/host/test2/klee-out-0$

We can see both of the errors occur at the same source line 28.

1	if (re[1] == '*')

However if we check the concrete testcase:

klee@9523002f7332:/host/test2/klee-out-0$ ktest-tool  test000022.ktest 
ktest file : 'test000022.ktest'
args       : ['test2.bc']
num objects: 1
object 0: name: 're'
object 0: size: 7
object 0: data: b'^\x01*\x01*\x01*'
object 0: hex : 0x5e012a012a012a
object 0: text: ^.*.*.*
klee@9523002f7332:/host/test2/klee-out-0$

We see the buffer contains ^.*.*.*, which lacks the terminating null that the program requires.
This is actually an issue with our harness!

We will need to add the line

1	klee_assume(re[SIZE - 1] == '\0');

to force KLEE to only explore states where the buffer is null terminated.

klee_assume takes a single argument (an unsigned integer) which generally should be some kind of conditional expression, and “assumes” that expression to be true on the current path (if that can never happen, i.e. the expression is provably false, KLEE will report an error).

A warning from the developers of KLEE:

There is one important caveat when using klee_assume with multiple conditions.
Remember that boolean conditionals like ‘&&’ and ‘||’ may be compiled into code which branches before computing the result of the expression.
In such situations KLEE will branch the process before it reaches the call to klee_assume, which may result in exploring unnecessary additional states.
For this reason it is good to use as simple expressions as possible to klee_assume (for example splitting a single call into multiple ones), and to use the ‘&’ and ‘|’ operators instead of the short-circuiting ones.

klee@9523002f7332:/host/test2$ klee --only-output-states-covering-new test2.bc
KLEE: output directory is "/host/test2/klee-out-0"
KLEE: Using STP solver backend

KLEE: done: total instructions = 4235858
KLEE: done: completed paths = 5895
KLEE: done: partially completed paths = 0
KLEE: done: generated tests = 15
klee@9523002f7332:/host/test2$

This time no error is found, and all paths are exercised.

Now let’s move on to test 3.

Test3

#include <malloc.h>

#include <klee/klee.h>

/*
 * buffer structure:
 *  0x0-0x2: checksum 0x12239f
 *  0x3-0x4: size
 */

#define BUFFER_SZ 0x4

typedef unsigned char BYTE;

int check_buffer(BYTE *buffer);
void special_memA(BYTE *buffer, BYTE *mem);
__attribute__((always_inline)) BYTE *alloc_mem(BYTE sz);
__attribute__((always_inline)) void free_mem(BYTE *mem);

int check_buffer(BYTE *buffer)
{
    if (buffer[0] == 0x12)
        if (buffer[1] == 0x23)
            if (buffer[2] == 0x9f)
                return 1;
    return 0;
}


void special_memA(BYTE *buffer, BYTE *mem)
{
    int c;

    if (check_buffer(buffer)) {
        c = buffer[3];
        while (c--)
            *mem++ = 'A';
    }
}

__attribute__((always_inline)) BYTE *alloc_mem(BYTE sz)
{
    return malloc(sz);
}

__attribute__((always_inline)) void free_mem(BYTE *mem)
{
    free(mem);
}

int main(void)
{

    BYTE sz, *mem;
    BYTE buffer[BUFFER_SZ];

    klee_make_symbolic(&sz, sizeof(BYTE), "mem size");
    klee_make_symbolic(buffer, BUFFER_SZ, "request buffer");

    klee_assume(sz > 0);

    mem = alloc_mem(sz);

    special_memA(buffer, mem);

    free_mem(mem);
    
    return 0;
}

This is a checksum program, that checks if the header bytes of a buffer matches a certain requirement before moving on with a memory operation.
Traditional blackbox fuzzers often fare poorly against these programs, because it takes the mutation engine a long time before hitting the correct bytes.

Of course we can use klee_assume to feed in the checksum, but let’s see how long KLEE will take to figure it out.

After solving the checksum there’s a logic bug that leads to a heap OOB write for KLEE to discover.
We assume sz > 0 so KLEE doesn’t model malloc to returning a failure.

Results:

klee@9523002f7332:/host/test3$ clang -I /home/klee/klee_src/include -emit-llvm -c -g -O0 -Xclang -disable-O0-optnone test3.c
klee@9523002f7332:/host/test3$ klee --only-output-states-covering-new test3.bc
KLEE: output directory is "/host/test3/klee-out-0"
KLEE: Using STP solver backend
KLEE: ERROR: test3.c:43: concretized symbolic size
KLEE: NOTE: now ignoring this error at this location
KLEE: WARNING ONCE: Alignment of memory from call "malloc" is not modelled. Using alignment of 8.
KLEE: ERROR: test3.c:37: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location

KLEE: done: total instructions = 145
KLEE: done: completed paths = 5
KLEE: done: partially completed paths = 2
KLEE: done: generated tests = 6
klee@9523002f7332:/host/test3$

In just a second KLEE was able to solve the checksum and discover the memory bug.

It throws 2 errors, one is a concretized symbolic size error, which is just KLEE’s way to warn the user that the arguments to malloc is freely controllable and may lead to a huge allocation depleting system memory.

The second error is more interesting, and it seems like our OOB write.

klee@9523002f7332:/host/test3$ cd klee-out-0/;ls
assembly.ll  messages.txt  run.stats          test000001.ktest      test000002.ktest  test000004.ktest  test000006.kquery  test000006.ptr.err
info         run.istats    test000001.kquery  test000001.model.err  test000003.ktest  test000005.ktest  test000006.ktest   warnings.txt
klee@9523002f7332:/host/test3/klee-out-0$ ktest-tool test000006.ktest 
ktest file : 'test000006.ktest'
args       : ['test3.bc']
num objects: 2
object 0: name: 'mem size'
object 0: size: 1
object 0: data: b'\x01'
object 0: hex : 0x01
object 0: int : 1
object 0: uint: 1
object 0: text: .
object 1: name: 'request buffer'
object 1: size: 4
object 1: data: b'\x12#\x9f\x02'
object 1: hex : 0x12239f02
object 1: int : 43983634
object 1: uint: 43983634
object 1: text: .#..
klee@9523002f7332:/host/test3/klee-out-0$ cat test000006.ptr.err
Error: memory error: out of bound pointer
File: test3.c
Line: 37
assembly.ll line: 90
State: 1
Stack: 
        #000000090 in special_memA(buffer=94679447078000, mem=94679447078144) at test3.c:37
        #100000157 in main() at test3.c:64
Info: 
        address: 94679447078145
        next: object at 22640733514432 of size 1536
                MO13[1536] (no allocation info)
klee@9523002f7332:/host/test3/klee-out-0$

We can clearly see the mem size argment being set to 1, so theoretically only 1 byte of memory is allocated.
However, the request buffer triggers a write of 2 bytes, leading to a heap overflow.(with the checksum easily solved!)

The problem now is, most malloc implementations have a mimimum size policy.
On linux, a minimum size of 0x20 will be returned, even if you request for 1 byte.

That means we will not be able to reproduce the bug as a crash if we use it directly on a production copy of the binary, because writing 2 bytes to a 0x20 sized allocation is not an issue.
(Another caveat of blackbox fuzzing, where such subtle overflows will not be detected as a crash)

We can enable address sanitizer, discussed in the first post of my fuzzing series and debug symbols to triage the bug.

klee@9523002f7332:/host/test3$ export LD_LIBRARY_PATH=/home/klee/klee_build/lib/:$LD_LIBRARY_PATH
klee@9523002f7332:/host/test3$ gcc -I /home/klee/klee_src/include/ -L /home/klee/klee_build/lib/ test3.c -lkleeRuntest -fsanitize=address -g
test3.c:46:37: warning: always_inline function might not be inlinable [-Wattributes]
 __attribute__((always_inline)) void free_mem(BYTE *mem)
                                     ^~~~~~~~
test3.c:41:38: warning: always_inline function might not be inlinable [-Wattributes]
 __attribute__((always_inline)) BYTE *alloc_mem(BYTE sz)
                                      ^~~~~~~~~
klee@9523002f7332:/host/test3$ KTEST_FILE=klee-out-0/test000006.ktest ./a.out 
=================================================================
==118==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000d1 at pc 0x562f9b600da8 bp 0x7fff8aa98490 sp 0x7fff8aa98480
WRITE of size 1 at 0x6020000000d1 thread T0
    #0 0x562f9b600da7 in special_memA /host/test3/test3.c:37
    #1 0x562f9b600f74 in main /host/test3/test3.c:64
    #2 0x14b8d788ac86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
    #3 0x562f9b600b39 in _start (/host/test3/a.out+0xb39)

0x6020000000d1 is located 0 bytes to the right of 1-byte region [0x6020000000d0,0x6020000000d1)
allocated by thread T0 here:
    #0 0x14b8d7f3cb40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #1 0x562f9b600f56 in alloc_mem /host/test3/test3.c:43
    #2 0x562f9b600f56 in main /host/test3/test3.c:62
    #3 0x14b8d788ac86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)

SUMMARY: AddressSanitizer: heap-buffer-overflow /host/test3/test3.c:37 in special_memA
Shadow bytes around the buggy address:
  0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff8000: fa fa 00 fa fa fa 00 01 fa fa 00 01 fa fa 01 fa
=>0x0c047fff8010: fa fa 00 07 fa fa 04 fa fa fa[01]fa fa fa fa fa
  0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==118==ABORTING
klee@9523002f7332:/host/test3$

Replay the testcase and we successfully discover a memory corruption bug.

ASAN tells us the bug is in special_memA /host/test3/test3.c:37, and we have a WRITE of 1 byte to the right of a partially addressable block marked as [1], which is our rightfully allocated 1 byte of memory.

Line 37 corresponds to

1	*mem++ = 'A';

Up to this point, KLEE has done exceptionally well in our 3 tests, but we have yet to test its true capabilities, which is to reason about symbolic environments.

Test4

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>

int check_password(char *buf) {
  return !strcmp(buf, "Th1$IsmYP4s$w0rd!");
}

int check_password2(int fd) {
  char buf[5];
  if (read(fd, buf, 5) != -1) {
    if (buf[0] == 'h' && buf[1] == 'e' &&
	buf[2] == 'l' && buf[3] == 'l' &&
	buf[4] == 'o')
      return 1;
  }
  return 0;
}


int main(int argc, char **argv) {
  int fd;

  if (argc < 3)
     return 1;
  
  if ((fd = open(argv[2], O_RDONLY)) == -1) {
     puts("file not found");
     return 1;
  }

  if (check_password(argv[1]) && check_password2(fd)) {
    puts("Password found!");
    return 0;
  }

  puts("Wrong password");
  return 1;
}

This rather contrived program(but more closely resembles a real life software) will not be fully explored if we had used our previous steps to analyse it.

The first reason is that the program takes in data from its arguments instead of hardcoding in the source.

We can of course patch the program such that it takes a hardcoded symbolic buffer instead, but KLEE can deal with arguments too.

The second reason is that the flow of the program entirely depends on the return value of strcmp, a library function.

By default, KLEE will not symbolically execute into library functions, but instead model them.
For example, the strcmp function may be modelled to always returning 1, and we will never find the correct password.

Lastly, it also tries to access a file(we provide the name as argument) to check for another password.
We will have to make the file symbolic as well.

KLEE has a number of arguments to deal with the environment.

klee@9523002f7332:/host/test4$ klee --only-output-states-covering-new --libc=uclibc --posix-runtime test4.bc -sym-arg 20 A -sym-files 1 20            
KLEE: NOTE: Using POSIX model: /tmp/klee_build110stp_z3/runtime/lib/libkleeRuntimePOSIX64_Debug+Asserts.bca
KLEE: NOTE: Using klee-uclibc : /tmp/klee_build110stp_z3/runtime/lib/klee-uclibc.bca
KLEE: output directory is "/host/test4/klee-out-0"
KLEE: Using STP solver backend
warning: Linking two modules of different target triples: test4.bc' is 'x86_64-unknown-linux-gnu' whereas '__uClibc_main.os' is 'x86_64-pc-linux-gnu'

KLEE: WARNING: executable has module level assembly (ignoring)
KLEE: WARNING ONCE: calling external: syscall(16, 0, 21505, 94605649392848) at klee_src/runtime/POSIX/fd.c:1007 10
KLEE: WARNING ONCE: Alignment of memory from call "malloc" is not modelled. Using alignment of 8.
KLEE: WARNING ONCE: calling __klee_posix_wrapped_main with extra arguments.
Wrong password
Wrong password
Wrong passwordWrong password
Wrong password
Wrong password
Wrong password
Wrong password
Wrong password
Wrong password
Wrong password
Wrong passwordWrong password
Wrong password
Wrong password

Wrong passwordWrong password
Wrong password

Wrong password
Wrong passwordWrong password
Wrong passwordPassword found!Wrong password





KLEE: done: total instructions = 39252
KLEE: done: completed paths = 24
KLEE: done: partially completed paths = 0
KLEE: done: generated tests = 7
klee@9523002f7332:/host/test4$

First of all, we deal with the issue of library functions.

By using --libc=uclibc, we force KLEE to link the LLVM bitcode with an instrumented version of uclibc, so KLEE can symbolically analyse into library functions.

The --posix-runtime flag adds support to model low level posix syscalls, as we discussed in the theory paper previously. This two flags should be used together in most cases.

The line test4.bc -sym-arg 20 A -sym-files 1 20 passes the arguments to the test4.bc program.
Instead of a test4.bc password xx.txt concrete argument, we pass symbolic arguments.

-sym-arg 20 defines a single symbolic argument of 20 bytes long, and -sym-files 1 20 creates a single symbolic file of 20 bytes size.

What about the A?

That’s just weird KLEE syntax which I don’t agree with.
KLEE names its symbolic files as A, B, C… etc, so by passing A we pass the symbolic file to the program.

Quoting from an issue opened in github(https://github.com/klee/klee/issues/712):

If your tested applications does a fopen(argv[1], ...), the KLEE runtime intercepts the call and the symbolic file gets opened. If you provide a different file name (e.g. /etc/passwd), the real file gets opened.

Note that the position of arguments matter.

The binary name and binary arguments must be the last in the arguments passed to KLEE.

Now if we check the output directory, we see the test case used to find the correct password.

klee@9523002f7332:/host/test4$ cd klee-out-0/;ls  
assembly.ll  messages.txt  run.stats         test000002.ktest  test000004.ktest  test000006.ktest  warnings.txt
info         run.istats    test000001.ktest  test000003.ktest  test000005.ktest  test000007.ktest
klee@9523002f7332:/host/test4/klee-out-0$ ktest-tool test000006.ktest 
ktest file : 'test000006.ktest'
args       : ['test4.bc', '-sym-arg', '20', 'A', '-sym-files', '1', '20']
num objects: 4
object 0: name: 'arg00'
object 0: size: 21
object 0: data: b'Th1$IsmYP4s$w0rd!\x00\xff\xff\xff'
object 0: hex : 0x5468312449736d5950347324773072642100ffffff
object 0: text: Th1$IsmYP4s$w0rd!....
object 1: name: 'A-data'
object 1: size: 20
object 1: data: b'hello\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
object 1: hex : 0x68656c6c6fffffffffffffffffffffffffffffff
object 1: text: hello...............
object 2: name: 'A-data-stat'
object 2: size: 144
object 2: data: b'/\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\xff\xff\xff\xff\x01\x00\x00\x00\x00\x00\x00\x00\xa4\x81\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\x00\x10\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\x1b\xf2\xabc\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\x1b\xf2\xabc\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\x1b\xf2\xabc\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
object 2: hex : 0x2f0000000000000001000000ffffffff0100000000000000a48100000000000000000000ffffffff0000000000000000ffffffffffffffff0010000000000000ffffffffffffffff1bf2ab6300000000ffffffffffffffff1bf2ab6300000000ffffffffffffffff1bf2ab6300000000ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
object 2: text: /..........................................................................c...............c...............c....................................
object 3: name: 'model_version'
object 3: size: 4
object 3: data: b'\x01\x00\x00\x00'
object 3: hex : 0x01000000
object 3: int : 1
object 3: uint: 1
object 3: text: ....
klee@9523002f7332:/host/test4/klee-out-0$ cd ..
klee@9523002f7332:/host/test4$ gcc -o test4 ./test4.c 
klee@9523002f7332:/host/test4$ klee-replay test4 klee-out-0/test000006.ktest 
KLEE-REPLAY: NOTE: Test file: klee-out-0/test000006.ktest
KLEE-REPLAY: NOTE: Arguments: "test4" "Th1$IsmYP4s$w0rd!" "A" 
KLEE-REPLAY: NOTE: Storing KLEE replay files in /tmp/klee-replay-ohNHIA
KLEE-REPLAY: NOTE: Creating file /tmp/klee-replay-ohNHIA/A of length 20
KLEE-REPLAY: WARNING: check_file A: dev mismatch: 197 vs 47
Password found!
KLEE-REPLAY: NOTE: EXIT STATUS: NORMAL (0 seconds)
KLEE-REPLAY: NOTE: removing /tmp/klee-replay-ohNHIA
klee@9523002f7332:/host/test4$

Note how we used klee-replay to replay the testcase this time instead of KTEST_FILE.
This is because the KTEST_FILE method is unable to replay symbolic arguments.

Without external tools, KLEE is unable to display information that helps us determine the exact test case that triggered the “Password found!” path.
This is quite annoying as we will have to replay every test case to find out.

One possible solution is to compile uclibc such that printf is symbolically analysed too with -DKLEE_SYM_PRINTF, then capture stdout by making it symbolic with -sym-stdout.
However this adds a ton of overhead.

At point of writing I’m unsure of any tools that can aid in categorizing test cases.

Anyways, it was still able to find the password :)

Enough with toy programs, let’s try KLEE on some real open source software.

Test5

https://github.com/jmcnamara/libxlsxwriter

Reading the docs(https://libxlsxwriter.github.io/getting_started.html) tells us that we need to use make to compile the source to a shared library.
Afterwards we can link our harness with this library using KLEE’s --link-llvm-lib flag.

First we edit the Makefile to change the build location:

1	PREFIX ?= /host/test5/libxlsxwriter/

Then we build the library with wllvm, which allows us to later extract the LLVM bitcode from the compiled library.

1
2
3

klee@9523002f7332:/host/test5/libxlsxwriter$ export LLVM_COMPILER=clang
klee@9523002f7332:/host/test5/libxlsxwriter$ CC=wllvm CFLAGS="-g -O1 -Xclang -disable-llvm-passes -D__NO_STRING_INLINES  -D_FORTIFY_SOURCE=0 -U__OPTIMIZE__" make
make[1]: Entering directory '/host/test5/libxlsxwriter/third_party/minizip'

We compile with -O1 -disable-llvm-passes instead of -O0 -disable-O0-optnone, because it produces bitcode that works better with KLEE’s --optimize.

-D__NO_STRING_INLINES -D_FORTIFY_SOURCE=0 -U__OPTIMIZE__ prevents clang from replacing functions with safer versions, which KLEE might not support.

Now we should have the libraries built.

klee@9523002f7332:/host/test5/libxlsxwriter$ cd lib
klee@9523002f7332:/host/test5/libxlsxwriter/lib$ ls
libxlsxwriter.a  libxlsxwriter.so  libxlsxwriter.so.4
klee@9523002f7332:/host/test5/libxlsxwriter/lib$

.a is static, while .so is dynamic.

We’ll use the dynamic version in this case so KLEE can use its own uclibc.

klee@9523002f7332:/host/test5/libxlsxwriter/lib$ extract-bc libxlsxwriter.so
klee@9523002f7332:/host/test5/libxlsxwriter/lib$ ls
libxlsxwriter.a  libxlsxwriter.so  libxlsxwriter.so.4  libxlsxwriter.so.bc
klee@9523002f7332:/host/test5/libxlsxwriter/lib$

With the libraries in place, we need a harness to trigger library functions.

In my first run I’ll exercise the part of code that deals with formulas, since parsing formulas is relatively complex.

https://libxlsxwriter.github.io/working_with_formulas.html

The worksheet_write_dynamic_array_formula() function writes an Excel 365 dynamic array formula to a cell range.


lxw_error worksheet_write_dynamic_array_formula	(
		lxw_worksheet *worksheet,
		lxw_row_t     first_row,
		lxw_col_t     first_col,
		lxw_row_t     last_row,
		lxw_col_t     last_col,
		const char    *formula,
		lxw_format    *format 
)	

Dynamic array formulas and their usage in libxlsxwriter is explained in detail Dynamic Array support. 

The following is a example usage:
worksheet_write_dynamic_array_formula(worksheet, 1, 5, 1, 5, "=_xlfn._xlws.FILTER(A1:D17,C1:C17=K2)", NULL);

Modifying to add KLEE instrumentation:

#include <klee/klee.h>

#include "xlsxwriter.h"

#define FORMULA_SZ 20

int main() {
 
    lxw_workbook  *workbook  = workbook_new("myexcel.xlsx");
    lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);
    
    int firstrow, firstcol, lastrow, lastcol;
    char formula[FORMULA_SZ];

    klee_make_symbolic(&firstrow, sizeof(int), "first row");
    klee_make_symbolic(&firstcol, sizeof(int), "first column");
    klee_make_symbolic(&lastrow, sizeof(int), "last row");
    klee_make_symbolic(&lastcol, sizeof(int), "last column");

    klee_make_symbolic(formula, sizeof(formula), "formula");
    klee_assume(formula[FORMULA_SZ-1] == 0);

    worksheet_write_dynamic_array_formula(worksheet, firstrow, firstcol, lastrow, lastcol, formula, NULL);
 
    return workbook_close(workbook);
}

Then we run KLEE:

1	klee --optimize --only-output-states-covering-new --libc=uclibc --posix-runtime --link-llvm-lib libxlsxwriter.so.bc test5.bc

After 2 hours:

KLEE: done: total instructions = 1052415449
KLEE: done: completed paths = 6
KLEE: done: partially completed paths = 41043
KLEE: done: generated tests = 36

Only 6 completed paths, that’s quite terrible actually…

I’m not sure what went wrong but maybe my buffer size was too large such that the STP solver is slowly solving constraints?

I’ll have to instrument this with gcov or something to find out.

KLEE did however find one trivial bug in worksheet.c:7976 within seconds of running:

klee@9523002f7332:/host/test5/libxlsxwriter/lib/klee-out-0$ cat test000001.ptr.err 
Error: memory error: out of bound pointer
File: worksheet.c
Line: 7976
assembly.ll line: 144640
State: 226
Stack: 
        #000144640 in _store_array_formula(self=93971184173056, first_row=symbolic, first_col=symbolic, last_row=symbolic, last_col=symbolic, formula=93971183959104, format=0, result=0, is_dynamic=1) at worksheet.c:7976
        #100145057 in worksheet_write_dynamic_array_formula(self=93971184173056, first_row=symbolic, first_col=symbolic, last_row=symbolic, last_col=symbolic, formula=93971183959104, format=0) at worksheet.c:8093
        #200011870 in __klee_posix_wrapped_main() at test5.c:23
        #300009443 in __user_main(1, 93971151508544, 93971151508560) at klee_src/runtime/POSIX/klee_init_env.c:245
        #400002687 in __uClibc_main(93971181887224, 1, 93971151508544, 0, 0, 0, 0) at libc/misc/internals/__uClibc_main.c:401
        #500002852 in main(1, 93971151508544)
Info: 
        address: 93971204442135
        next: object at 22643338611392 of size 1536
                MO1626[1536] (no allocation info)
klee@9523002f7332:/host/test5/libxlsxwriter/lib/klee-out-0$ ktest-tool test000001.ktest 
ktest file : 'test000001.ktest'
args       : ['test5.bc']
num objects: 6
object 0: name: 'model_version'
object 0: size: 4
object 0: data: b'\x01\x00\x00\x00'
object 0: hex : 0x01000000
object 0: int : 1
object 0: uint: 1
object 0: text: ....
object 1: name: 'first row'
object 1: size: 4
object 1: data: b'\x00\x00\x00\x00'
object 1: hex : 0x00000000
object 1: int : 0
object 1: uint: 0
object 1: text: ....
object 2: name: 'first column'
object 2: size: 4
object 2: data: b'\x00\x00\x00\x00'
object 2: hex : 0x00000000
object 2: int : 0
object 2: uint: 0
object 2: text: ....
object 3: name: 'last row'
object 3: size: 4
object 3: data: b'\x00\x00\x00\x00'
object 3: hex : 0x00000000
object 3: int : 0
object 3: uint: 0
object 3: text: ....
object 4: name: 'last column'
object 4: size: 4
object 4: data: b'\x00\x00\x00\x00'
object 4: hex : 0x00000000
object 4: int : 0
object 4: uint: 0
object 4: text: ....
object 5: name: 'formula'
object 5: size: 20
object 5: data: b'{\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x00'
object 5: hex : 0x7b00ffffffffffffffffffffffffffffffffff00
object 5: text: {...................
klee@9523002f7332:/host/test5/libxlsxwriter/lib/klee-out-0$

The offending code is:

/* Copy and trip leading "{=" from formula. */
if (formula[0] == '{')
    if (formula[1] == '=')
        formula_copy = lxw_strdup(formula + 2);
    else
        formula_copy = lxw_strdup(formula + 1);
else
    formula_copy = lxw_strdup_formula(formula);

/* Strip trailing "}" from formula. */
if (formula_copy[strlen(formula_copy) - 1] == '}')
    formula_copy[strlen(formula_copy) - 1] = '\0';

When formula is a single {, formula_copy on the heap will contain just the null terminator.

strlen(formula_copy) will then return 0, and formula_copy will perform an OOB access to the -1th element.

The fix is to first check if the current byte is a null, and skip the following checks if it is.

Test6

Let’s try to test the date function this time.
https://libxlsxwriter.github.io/working_with_dates.html

Harness:

#include <klee/klee.h>

#include "xlsxwriter.h"

#define FMT_SZ 30

int main() {
 
    /* Create a new workbook and add a worksheet. */
    lxw_workbook  *workbook  = workbook_new("date_and_times02.xlsx");
    lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);

    /* A datetime to display. */
    int year, month, day, hour, min;
    double sec;
    
    klee_make_symbolic(&year, sizeof(int), "year");
    klee_make_symbolic(&month, sizeof(int), "month");
    klee_make_symbolic(&day, sizeof(int), "day");
    klee_make_symbolic(&hour, sizeof(int), "hour");
    klee_make_symbolic(&min, sizeof(int), "min");
    klee_make_symbolic(&sec, sizeof(double), "sec");
    lxw_datetime datetime = {year, month, day, hour, min, sec};
 
    /* Add a format with date formatting. */
    lxw_format *format = workbook_add_format(workbook);
    
    char fmtstr[FMT_SZ];
    klee_make_symbolic(fmtstr, sizeof(fmtstr), "format");
    klee_assume(fmtstr[FMT_SZ-1] == 0);

    format_set_num_format(format, fmtstr);
 
    /* Widen the first column to make the text clearer. */
    int firstcol, lastcol, width, firstrow;
    klee_make_symbolic(&firstcol, sizeof(int), "first column");
    klee_make_symbolic(&lastcol, sizeof(int), "last column");
    klee_make_symbolic(&width, sizeof(int), "width");
    klee_make_symbolic(&firstrow, sizeof(int), "first row");
    worksheet_set_column(worksheet, firstcol, lastcol, width, NULL);
 
    /* Write the datetime with formatting. */
    worksheet_write_datetime(worksheet, firstrow, firstcol, &datetime, format);
 
    return workbook_close(workbook);
}

Result:

klee@9523002f7332:/host/test5/libxlsxwriter/lib/klee-last$ cat test000003.ptr.err 
Error: memory error: out of bound pointer
File: utility.c
Line: 399
assembly.ll line: 136107
State: 435
Stack: 
        #000136107 in lxw_datetime_to_excel_date_epoch(datetime=94793297819808, date_1904=0) at utility.c:399
        #100145318 in worksheet_write_datetime(self=94793298026496, row_num=symbolic, col_num=symbolic, datetime=94793297819808, format=94793297915264) at worksheet.c:8163
        #200011941 in __klee_posix_wrapped_main() at test6.c:43
        #300009449 in __user_main(1, 94793263707776, 94793263707792) at klee_src/runtime/POSIX/klee_init_env.c:245
        #400002693 in __uClibc_main(94793296027944, 1, 94793263707776, 0, 0, 0, 0) at libc/misc/internals/__uClibc_main.c:401
        #500002858 in main(1, 94793263707776)
Info: 
        address: 94793299433716
        next: object at 22549396585152 of size 1536
                MO1626[1536] (no allocation info)
klee@9523002f7332:/host/test5/libxlsxwriter/lib/klee-last$ ktest-tool test000003.ktest 
ktest file : 'test000003.ktest'
args       : ['test6.bc']
num objects: 12
object  0: name: 'model_version'
object  0: size: 4
object  0: data: b'\x01\x00\x00\x00'
object  0: hex : 0x01000000
object  0: int : 1
object  0: uint: 1
object  0: text: ....
object  1: name: 'year'
object  1: size: 4
object  1: data: b'\x00:\x02\xb8'
object  1: hex : 0x003a02b8
object  1: int : -1207813632
object  1: uint: 3087153664
object  1: text: .:..
object  2: name: 'month'
object  2: size: 4
object  2: data: b'\x00\x00\x00\x01'
object  2: hex : 0x00000001
object  2: int : 16777216
object  2: uint: 16777216
object  2: text: ....
object  3: name: 'day'
object  3: size: 4
object  3: data: b'\x00\x00\x00\x00'
object  3: hex : 0x00000000
object  3: int : 0
object  3: uint: 0
object  3: text: ....
object  4: name: 'hour'
object  4: size: 4
object  4: data: b'\x00\x00\x00\x00'
object  4: hex : 0x00000000
object  4: int : 0
object  4: uint: 0
object  4: text: ....
object  5: name: 'min'
object  5: size: 4
object  5: data: b'\x00\x00\x00\x00'
object  5: hex : 0x00000000
object  5: int : 0
object  5: uint: 0
object  5: text: ....
object  6: name: 'sec'
object  6: size: 8
object  6: data: b'\x00\x00\x00\x00\x00\x00\x00\x00'
object  6: hex : 0x0000000000000000
object  6: int : 0
object  6: uint: 0
object  6: text: ........
object  7: name: 'format'
object  7: size: 30
object  7: data: b'\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x00'
object  7: hex : 0x00ffffffffffffffffffffffffffffffffffffffffffffffffffffffff00
object  7: text: ..............................
object  8: name: 'first column'
object  8: size: 4
object  8: data: b'\x01\x00\xff\xff'
object  8: hex : 0x0100ffff
object  8: int : -65535
object  8: uint: 4294901761
object  8: text: ....
object  9: name: 'last column'
object  9: size: 4
object  9: data: b'\x00@\xff\xff'
object  9: hex : 0x0040ffff
object  9: int : -49152
object  9: uint: 4294918144
object  9: text: .@..
object 10: name: 'width'
object 10: size: 4
object 10: data: b'\x00\x00\x00\x00'
object 10: hex : 0x00000000
object 10: int : 0
object 10: uint: 0
object 10: text: ....
object 11: name: 'first row'
object 11: size: 4
object 11: data: b'\x00\x00\x01\x00'
object 11: hex : 0x00000100
object 11: int : 65536
object 11: uint: 65536
object 11: text: ....
klee@9523002f7332:/host/test5/libxlsxwriter/lib/klee-last$

Another memory OOB, this time in utility.c:399.

Offending Code:

/*
 * Convert a lxw_datetime struct to an Excel serial date, with a 1900
 * or 1904 epoch.
 */
double
lxw_datetime_to_excel_date_epoch(lxw_datetime *datetime, uint8_t date_1904)
{
    int year = datetime->year;
    int month = datetime->month;
    /* Set month days and check for leap year. */
    int mdays[] = { 0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
    
    -- snippet --

    /* Add days for previous months. */
    for (i = 0; i < month; i++) {
        days += mdays[i];
    }

    -- snippet --

    return days + seconds;
}

As shown above, user specified month is used as a counter to iterate the array mdays.

Specifying a large value for month leads to a controllable array OOB access, resulting in a segmentation fault.

The fix will be to check that 0 <= months <= 12

I’m also not a fan of the fact that signed integers are used to represent year, month, day and these non-negative values.

klee@9523002f7332:/host/test5/libxlsxwriter/lib$ head bug.c
#include "xlsxwriter.h"
 
int main() {
 
    /* A datetime to display. */
    lxw_datetime datetime = {2013, 200000, 28, 12, 0, 0.0};
 
    /* Create a new workbook and add a worksheet. */
    lxw_workbook  *workbook  = workbook_new("date_and_times02.xlsx");
    lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);
klee@9523002f7332:/host/test5/libxlsxwriter/lib$ ./bug 
Segmentation fault
klee@9523002f7332:/host/test5/libxlsxwriter/lib$

Fun fact, this bug is actually not reproducable on a gcc compiled version of this library because…:

if ( month > 0 )
{
  days = mdays;
  if ( month != 1 )
  {
    days = *(&mdays + 1) + mdays;
    if ( month != 2 )
    {
      days += *(&mdays + 2);
      if ( month != 3 )
      {
        days += *(&mdays + 3);
        if ( month != 4 )
        {
          days += mdays_16.m128i_i32[0];
          if ( month != 5 )
          {
            days += mdays_16.m128i_i32[1];
            if ( month != 6 )
            {
              days += mdays_16.m128i_i32[2];
              if ( month != 7 )
              {
                days += mdays_16.m128i_i32[3];
                if ( month != 8 )
                {
                  v18 = days + mdays_32.m128i_i32[0];
                  days += mdays_32.m128i_i32[0];
                  if ( month != 9 )
                  {
                    days = mdays_32.m128i_i32[1] + v18;
                    if ( month != 10 )
                    {
                      days += mdays_32.m128i_i32[2];
                      if ( month != 11 )
                      {
                        v19 = mdays_32.m128i_i32[3] + days;
                        if ( month != 12 )
                          v19 += 31;
                        v7 += v19;
                        goto LABEL_26;
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
  goto LABEL_30;
}

gcc realised that mdays can legally only be 12 elements large, so it compiled the for loop into 12 hardcoded comparisons.

This dumb but safe “optimization” was able to mitigate the bug.

clang however uses a while loop and xmm registers in its optimization, thus the bug still exists.

do
  {
    v15 = _mm_add_epi32(
            _mm_add_epi32(
              _mm_add_epi32(
                _mm_add_epi32(v15, *(__m128i *)((char *)&mdays + 4 * v16)),
                 *(__m128i *)((char *)&v26 + 4 * v16)),
               *(__m128i *)((char *)&v27 + 4 * v16 + 16)),
            *(__m128i *)((char *)&v29 + 4 * v16));
    v17 = _mm_add_epi32(
             _mm_add_epi32(
               _mm_add_epi32(
                 _mm_add_epi32(v17, *(__m128i *)((char *)&v25 + 4 * v16)),
                *(__m128i *)((char *)&v27 + 4 * v16)),
              *(__m128i *)((char *)&v28 + 4 * v16)),
            *(__m128i *)((char *)&v30 + 4 * v16));
    v16 += 32LL;
    v18 += 4LL;
  }
while ( v18 );

This concludes the first part of exploration with KLEE.

Conclusion

KLEE is a super powerful tool with an intuitive user interface and neat documentation.

Through practical tests we can confirm that KLEE is not just useful in theory, but is actually capable of auditing software bugs in the real world.

Within seconds of running, KLEE was able to find 2 unique bugs in a 1.1k star open source project.

However, I did not attain good code coverage unlike what was promised in the paper.

After 2 hours of running, only 6 complete paths were found.

There are a few possible reasons:

My usage of KLEE is incorrect. There is a better way to instrument the target and I’ve missed out on some key arguments to KLEE.
My harness was unable to properly exercise the code, maybe due to too many symbolic constraints or too large a symbolic buffer.
The discovered bug prevented KLEE from exploring other parts of code following it.

In order to improve coverage, my next step of research will be into tools that can report the parts of code that were exercised by KLEE.
I will also attempt to patch the bugs and run KLEE again, so these bugs won’t interfere with KLEE’s exploration of other code beneath it.

Another area of research will be the performance of KLEE when coupled with binary lifting tools, such as mcsema that can convert binary into LLVM IR.

That will make a future blog post.