Testing KLEE against a variety of softwares
Introduction
After going through all the theory in the previous post, let’s see how KLEE performs against normal toy programs, a buggy toy program, a toy password checker and an open source excel library.
KLEE Setup
Download KLEE as a docker image:
1 | docker pull klee/klee |
Create a persistent container using the downloaded image:
1 | docker run -v "C:\Users\User\Desktop\hacking\klee":/host -ti --name=klee_261222 --ulimit="stack=-1:-1" klee/klee |
Notice we didn’t use --rm
so the container will not be destroyed when we exit it from it and we also gave the container a name using the --name
flag.
Note the --ulimit
option sets an unlimited stack size inside the container. This is to avoid stack overflow issues when running KLEE.
The -v
option mounts the host folder into the docker container at /host
, so we can access source files to audit with KLEE.
If this worked correctly your shell prompt will have changed and you will be the klee
user.
To exit, just use the exit
command.
List all containers with
1 | docker ps -a |
We can always start the container again with
1 | docker start -ai klee_261222 |
To delete the container:
1 | docker rm klee_261222 |
Test1
1 |
|
In this small c program we are testing the get_sign
function, which has 3 possible paths.
In order to test this function with KLEE, we need to run it on symbolic input.
To mark a variable as symbolic, we use the klee_make_symbolic() function (defined in klee/klee.h), which takes three arguments:
- the address of the variable (memory location) that we want to treat as symbolic
- the size of the variable
- a name (can be anything)
Now we need to compile this code into LLVM bitcode.
1 | clang -I /home/klee/klee_src/include -emit-llvm -c -g -O0 -Xclang -disable-O0-optnone test1.c |
The -I
argument is used so that the compiler can find klee/klee.h
, which contains definitions for the intrinsic functions used to interact with the KLEE virtual machine, such as klee_make_symbolic
.
It is useful to build with -g
to add debug information to the bitcode file, which we use to generate source line level statistics information.
All optimizations are disbaled with -O0 -Xclang -disable-O0-optnone
, and we only compile and don’t link with -c
.
Now we see test1.bc
1 | klee@9523002f7332:/host/test1$ ls |
Note some security checks by KLEE requires clang to use certain compilation flags.
For example the -fsanitize=signed-integer-overflow
flag is required to detect signed integer overflow. These clang options instrument program.bc with overflow checks that are used by KLEE.
Using llvm-dis
we can check out the underlying bitcode
1 | klee@9523002f7332:/host/test1$ llvm-dis test1.bc |
KLEE
will work with this IR code to perform symbolic execution.
Now we can run KLEE:
1 | klee@9523002f7332:/host/test1$ klee test1.bc |
And we see it detects 3 paths in less then a second.
1 | klee@9523002f7332:/host/test1$ cd klee-out-0/ |
KLEE returns 3 test case files, which corresponds to each path of the program KLEE explored.
These files are in binary format, and can be parsed with ktest-tool
.
For information regarding the other files that KLEE returns, check out https://klee.github.io/docs/files/
1 | klee@9523002f7332:/host/test1/klee-out-0$ ktest-tool test000001.ktest |
As we have expected, KLEE returns concrete values to exercise all 3 paths.
0, above 0, and less than 0.
Amazing! Now let’s move on to a slightly more interesting program.
Test2
1 |
|
The above is a small regex parser that supports ^
, *
, .
and $
.
For example, .*$
will match any text input.
This program is much more complicated than the previous, and let’s see how KLEE does.
1 | klee@9523002f7332:/host/test2$ clang -I /home/klee/klee_src/include -emit-llvm -c -g -O0 -Xclang -disable-O0-optnone test2.c |
In roughly 10 seconds, KLEE completed the assessment, completing 6675 paths and finding 2 memory errors.
That’s a lot of test files to store.
We can limit the test generation(concretization) to states that actually covered new code with --only-output-states-covering-new
as argument to KLEE.
If we had done that, only 16 test cases would have been generated. (We will use this for all following tests)
Note that many realistic programs have an infinite (or extremely large) number of paths through them, and it is common that KLEE will not terminate.
By default KLEE will run until the user presses Control-C (i.e. klee gets a SIGINT), but there are additional options to limit KLEE’s runtime and memory usage:
- max-time=<time span>: Halt execution after the given amount of time, e.g. 10min or 1h5s.
- max-forks=N: Stop forking after N symbolic branches, and run the remaining paths to termination.
- max-memory=N: Try to limit memory consumption to N megabytes.
When KLEE detects an error in the program being executed it will generate a test case which exhibits the error, and write some additional information about the error into a file testN.TYPE.err
, where N is the test case number, and TYPE identifies the kind of error.
For all program errors, KLEE will write a simple backtrace into the .err file.
1 | klee@9523002f7332:/host/test2/klee-out-0$ ls *.err |
We can see both of the errors occur at the same source line 28.
1 | if (re[1] == '*') |
However if we check the concrete testcase:
1 | klee@9523002f7332:/host/test2/klee-out-0$ ktest-tool test000022.ktest |
We see the buffer contains ^.*.*.*
, which lacks the terminating null that the program requires.
This is actually an issue with our harness!
We will need to add the line
1 | klee_assume(re[SIZE - 1] == '\0'); |
to force KLEE to only explore states where the buffer is null terminated.
klee_assume
takes a single argument (an unsigned integer) which generally should be some kind of conditional expression, and “assumes” that expression to be true on the current path (if that can never happen, i.e. the expression is provably false, KLEE will report an error).
A warning from the developers of KLEE:
There is one important caveat when using klee_assume with multiple conditions.
Remember that boolean conditionals like ‘&&’ and ‘||’ may be compiled into code which branches before computing the result of the expression.
In such situations KLEE will branch the process before it reaches the call to klee_assume, which may result in exploring unnecessary additional states.
For this reason it is good to use as simple expressions as possible to klee_assume (for example splitting a single call into multiple ones), and to use the ‘&’ and ‘|’ operators instead of the short-circuiting ones.
1 | klee@9523002f7332:/host/test2$ klee --only-output-states-covering-new test2.bc |
This time no error is found, and all paths are exercised.
Now let’s move on to test 3.
Test3
1 |
|
This is a checksum program, that checks if the header bytes of a buffer matches a certain requirement before moving on with a memory operation.
Traditional blackbox fuzzers often fare poorly against these programs, because it takes the mutation engine a long time before hitting the correct bytes.
Of course we can use klee_assume
to feed in the checksum, but let’s see how long KLEE will take to figure it out.
After solving the checksum there’s a logic bug that leads to a heap OOB write for KLEE to discover.
We assume sz > 0 so KLEE doesn’t model malloc to returning a failure.
Results:
1 | klee@9523002f7332:/host/test3$ clang -I /home/klee/klee_src/include -emit-llvm -c -g -O0 -Xclang -disable-O0-optnone test3.c |
In just a second KLEE was able to solve the checksum and discover the memory bug.
It throws 2 errors, one is a concretized symbolic size
error, which is just KLEE’s way to warn the user that the arguments to malloc is freely controllable and may lead to a huge allocation depleting system memory.
The second error is more interesting, and it seems like our OOB write.
1 | klee@9523002f7332:/host/test3$ cd klee-out-0/;ls |
We can clearly see the mem size
argment being set to 1, so theoretically only 1 byte of memory is allocated.
However, the request buffer triggers a write of 2 bytes, leading to a heap overflow.(with the checksum easily solved!)
The problem now is, most malloc
implementations have a mimimum size policy.
On linux, a minimum size of 0x20
will be returned, even if you request for 1 byte.
That means we will not be able to reproduce the bug as a crash if we use it directly on a production copy of the binary, because writing 2 bytes to a 0x20 sized allocation is not an issue.
(Another caveat of blackbox fuzzing, where such subtle overflows will not be detected as a crash)
We can enable address sanitizer, discussed in the first post of my fuzzing series and debug symbols to triage the bug.
1 | klee@9523002f7332:/host/test3$ export LD_LIBRARY_PATH=/home/klee/klee_build/lib/:$LD_LIBRARY_PATH |
Replay the testcase and we successfully discover a memory corruption bug.
ASAN tells us the bug is in special_memA /host/test3/test3.c:37
, and we have a WRITE of 1 byte to the right of a partially addressable block marked as [1]
, which is our rightfully allocated 1 byte of memory.
Line 37 corresponds to
1 | *mem++ = 'A'; |
Up to this point, KLEE has done exceptionally well in our 3 tests, but we have yet to test its true capabilities, which is to reason about symbolic environments.
Test4
1 |
|
This rather contrived program(but more closely resembles a real life software) will not be fully explored if we had used our previous steps to analyse it.
The first reason is that the program takes in data from its arguments instead of hardcoding in the source.
We can of course patch the program such that it takes a hardcoded symbolic buffer instead, but KLEE can deal with arguments too.
The second reason is that the flow of the program entirely depends on the return value of strcmp
, a library function.
By default, KLEE will not symbolically execute into library functions, but instead model them.
For example, the strcmp
function may be modelled to always returning 1, and we will never find the correct password.
Lastly, it also tries to access a file(we provide the name as argument) to check for another password.
We will have to make the file symbolic as well.
KLEE has a number of arguments to deal with the environment.
1 | klee@9523002f7332:/host/test4$ klee --only-output-states-covering-new --libc=uclibc --posix-runtime test4.bc -sym-arg 20 A -sym-files 1 20 |
First of all, we deal with the issue of library functions.
By using --libc=uclibc
, we force KLEE to link the LLVM bitcode with an instrumented version of uclibc
, so KLEE can symbolically analyse into library functions.
The --posix-runtime
flag adds support to model low level posix syscalls, as we discussed in the theory paper previously. This two flags should be used together in most cases.
The line test4.bc -sym-arg 20 A -sym-files 1 20
passes the arguments to the test4.bc
program.
Instead of a test4.bc password xx.txt
concrete argument, we pass symbolic arguments.
-sym-arg 20
defines a single symbolic argument of 20 bytes long, and -sym-files 1 20
creates a single symbolic file of 20 bytes size.
What about the A
?
That’s just weird KLEE syntax which I don’t agree with.
KLEE names its symbolic files as A, B, C… etc, so by passing A
we pass the symbolic file to the program.
Quoting from an issue opened in github(https://github.com/klee/klee/issues/712):
1 | If your tested applications does a fopen(argv[1], ...), the KLEE runtime intercepts the call and the symbolic file gets opened. If you provide a different file name (e.g. /etc/passwd), the real file gets opened. |
Note that the position of arguments matter.
The binary name and binary arguments must be the last in the arguments passed to KLEE.
Now if we check the output directory, we see the test case used to find the correct password.
1 | klee@9523002f7332:/host/test4$ cd klee-out-0/;ls |
Note how we used klee-replay
to replay the testcase this time instead of KTEST_FILE
.
This is because the KTEST_FILE
method is unable to replay symbolic arguments.
Without external tools, KLEE is unable to display information that helps us determine the exact test case that triggered the “Password found!” path.
This is quite annoying as we will have to replay every test case to find out.
One possible solution is to compile uclibc such that printf
is symbolically analysed too with -DKLEE_SYM_PRINTF
, then capture stdout by making it symbolic with -sym-stdout
.
However this adds a ton of overhead.
At point of writing I’m unsure of any tools that can aid in categorizing test cases.
Anyways, it was still able to find the password :)
Enough with toy programs, let’s try KLEE on some real open source software.
Test5
https://github.com/jmcnamara/libxlsxwriter
Reading the docs(https://libxlsxwriter.github.io/getting_started.html) tells us that we need to use make
to compile the source to a shared library.
Afterwards we can link our harness with this library using KLEE’s --link-llvm-lib
flag.
First we edit the Makefile to change the build location:
1 | PREFIX ?= /host/test5/libxlsxwriter/ |
Then we build the library with wllvm
, which allows us to later extract the LLVM bitcode from the compiled library.
1 | klee@9523002f7332:/host/test5/libxlsxwriter$ export LLVM_COMPILER=clang |
We compile with -O1 -disable-llvm-passes
instead of -O0 -disable-O0-optnone
, because it produces bitcode that works better with KLEE’s --optimize
.
-D__NO_STRING_INLINES -D_FORTIFY_SOURCE=0 -U__OPTIMIZE__
prevents clang from replacing functions with safer versions, which KLEE might not support.
Now we should have the libraries built.
1 | klee@9523002f7332:/host/test5/libxlsxwriter$ cd lib |
.a
is static, while .so
is dynamic.
We’ll use the dynamic version in this case so KLEE can use its own uclibc.
1 | klee@9523002f7332:/host/test5/libxlsxwriter/lib$ extract-bc libxlsxwriter.so |
With the libraries in place, we need a harness to trigger library functions.
In my first run I’ll exercise the part of code that deals with formulas, since parsing formulas is relatively complex.
https://libxlsxwriter.github.io/working_with_formulas.html
1 | The worksheet_write_dynamic_array_formula() function writes an Excel 365 dynamic array formula to a cell range. |
Modifying to add KLEE instrumentation:
1 |
|
Then we run KLEE:
1 | klee --optimize --only-output-states-covering-new --libc=uclibc --posix-runtime --link-llvm-lib libxlsxwriter.so.bc test5.bc |
After 2 hours:
1 | KLEE: done: total instructions = 1052415449 |
Only 6 completed paths, that’s quite terrible actually…
I’m not sure what went wrong but maybe my buffer size was too large such that the STP solver is slowly solving constraints?
I’ll have to instrument this with gcov
or something to find out.
KLEE did however find one trivial bug in worksheet.c:7976 within seconds of running:
1 | klee@9523002f7332:/host/test5/libxlsxwriter/lib/klee-out-0$ cat test000001.ptr.err |
The offending code is:
1 | /* Copy and trip leading "{=" from formula. */ |
When formula
is a single {
, formula_copy
on the heap will contain just the null terminator.
strlen(formula_copy)
will then return 0, and formula_copy
will perform an OOB access to the -1th element.
The fix is to first check if the current byte is a null, and skip the following checks if it is.
Test6
Let’s try to test the date function this time.
https://libxlsxwriter.github.io/working_with_dates.html
Harness:
1 |
|
Result:
1 | klee@9523002f7332:/host/test5/libxlsxwriter/lib/klee-last$ cat test000003.ptr.err |
Another memory OOB, this time in utility.c:399
.
Offending Code:
1 | /* |
As shown above, user specified month
is used as a counter to iterate the array mdays
.
Specifying a large value for month
leads to a controllable array OOB access, resulting in a segmentation fault.
The fix will be to check that 0 <= months <= 12
I’m also not a fan of the fact that signed integers are used to represent year, month, day and these non-negative values.
1 | klee@9523002f7332:/host/test5/libxlsxwriter/lib$ head bug.c |
Fun fact, this bug is actually not reproducable on a gcc compiled version of this library because…:
1 | if ( month > 0 ) |
gcc realised that mdays
can legally only be 12 elements large, so it compiled the for loop into 12 hardcoded comparisons.
This dumb but safe “optimization” was able to mitigate the bug.
clang however uses a while loop and xmm registers in its optimization, thus the bug still exists.
1 | do |
This concludes the first part of exploration with KLEE.
Conclusion
KLEE is a super powerful tool with an intuitive user interface and neat documentation.
Through practical tests we can confirm that KLEE is not just useful in theory, but is actually capable of auditing software bugs in the real world.
Within seconds of running, KLEE was able to find 2 unique bugs in a 1.1k star open source project.
However, I did not attain good code coverage unlike what was promised in the paper.
After 2 hours of running, only 6 complete paths were found.
There are a few possible reasons:
- My usage of KLEE is incorrect. There is a better way to instrument the target and I’ve missed out on some key arguments to KLEE.
- My harness was unable to properly exercise the code, maybe due to too many symbolic constraints or too large a symbolic buffer.
- The discovered bug prevented KLEE from exploring other parts of code following it.
In order to improve coverage, my next step of research will be into tools that can report the parts of code that were exercised by KLEE.
I will also attempt to patch the bugs and run KLEE again, so these bugs won’t interfere with KLEE’s exploration of other code beneath it.
Another area of research will be the performance of KLEE when coupled with binary lifting tools, such as mcsema
that can convert binary into LLVM IR.
That will make a future blog post.