A Review of KLEE’s internal design
Introduction
Following my previous post on the overview of Symbolic Execution
, let’s review a specific implementation of a SE engine, to see how this approach can be concretized in the real world to aid program analysis and bug hunting against real softwares.
KLEE
is selected due to its prevalence and proven ability to audit production level software.
KLEE Overview
KLEE works on LLVM Intermediate Representation(IR)
, which is a high level machine code that the source compiles to.
KLEE directly interprets the LLVM IR to create symbolic processes known as states
with registers, stack, heap, program counter, and path condition.
At any one time, there can be a large number of states waiting for KLEE to execute.
1 | The core of KLEE is an interpreter loop, which selects a state to run and symbolically executes a single instruction in the context of that state. |
This continues until no states are remaining, or a user-defined timeout is reached.
Instead of raw data values, expressions
are stored on the symbolic register, stack or heap, in the form of a tree
.
The leaves are symbolic values or constants, and the interior nodes are LLVM IR operations, such as add
, sub
.
Take the
1 | %dst = add i32 %src0, %src1 |
instruction as example
KLEE retrieves the addends from the %src0
and %src1
registers and writes a new expression Add(%src0, %src1)
to the %dst
register.
For efficiency, the code that builds expressions checks if all given operands are concrete (i.e., constants) and, if so, performs the operation natively, returning a constant expression.
When a conditional branch is hit, KLEE invokes the constraint solver to check if the boolean constraint can be proven true or false, and eliminates infeasible paths(to prevent path explosion as we’ve discussed previously).
If both paths are feasible, KLEE clones the current state so that it can explore both paths, updating the instruction pointer
and path condition
on each path appropriately.
Bug Checking
How does KLEE check for bugs?
Potentially dangerous operations implicitly generate branches that check if any input value exists that could cause an error.
For example, a division instruction generates a branch that checks for a zero divisor. Such branches work identically to normal branches. Thus, even when the check succeeds (i.e., an error is detected), execution continues on the false path, which adds the negation of the check as a constraint (e.g., making the divisor not zero). If an error is detected, KLEE generates a test case to trigger the error and terminates the state.
In higher level terms, it means a
1 | a = b / c; |
instruction will become
1 | if (c) |
As with other dangerous operations, load and store instructions generate checks, in this case to check that the address is in-bounds of a valid memory object.
1 | a[j] = 999; |
will become
1 | if (0 <= j < sizeof(a)) |
To facilitate the SMT solver, every memory object will be mapped as a SMT array object(i guess solved with the theory of arrays).
When a dereferenced pointer p can refer to N objects(struct in struct to emulate type polymorphism?), KLEE clones the current state N times. In each state it constrains p to be within bounds of its respective object and then performs the appropriate read or write operation.
Optimizations
KLEE’s optimizations include:
copy-on-write on object level instead of page level
sharing heap structures across multiple states, which can be cloned in constant time
expression rewriting
2x - x is simplified to x
- constraint set simplification
For example, imagine an inexact constraint such as x < 10
gets added, followed some time later by a constraint x = 5
. KLEE actively simplifies the constraint set by rewriting previous constraints when new equality constraints are added to the constraint set. In this example, substituting x = 5
into the first constraint simplifies it to TRUE, so KLEE eliminates the x < 10
constraint.
- implied concretization
When a constraint such as x + 1 = 10
is added to the path condition, x is concretized to be 9 along that path, and it has
to be 9. KLEE notes that and writes the constant to memory, such that future operations use cheaper constant expressions.
- constraint independence
Constraints that don’t overlap in memory are independent, and when a query is initiated, only relevant constraints in the constraint set will be sent to the solver.
- counter-example cache
Basically a cache that stores a constraint set, along with an instance of a concrete solution(or a special “no solution” sentinel if the constraint is not solvable) in a UBTree. Based on the cache, it uses 3 rules to reduce redundancy.
- If a subset of a constraint set has no solutions, the original constraint set also will not
- If the original constraint set is satisfiable, a subset of it is satisfiable with the same solution
- If a subset of a constraint set has a solution, it is
LIKELY
that this is also a solution to the original set.
Because checking a potential solution is cheap, KLEE tries substituting in all solutions for subsets of the constraint set and returns a satisfying solution, if found.
This managed to reduce the number of queries KLEE makes to the SMT solver by a whopping 95%, and the efficacy increases with time.
State Selection
Now how does KLEE select a state to execute?
It uses the following 2 methods one by one:
- Random path selection
KLEE maintains a binary tree with each interior node being the point of the program that branched, and each leaf node containing the state of each branch. States are selected by traversing this tree from the root and randomly selecting the path to follow at branch points.
This does not take into account the depth of each subtree, so it does not skew selection against shallower trees(which are more likely to explore new paths due to their relatively more free constraints).
It also avoids “starvation”, where a part of a program is rapidly creating new states in a tight loop and it just keeps expanding and taking up execution time.
- Coverage-Optimized search
A weight for each state is computed, based on heuristics like the minimum distance to an uncovered instruction, the call stack, and whether the state recently discovered new code.
Not sure why the call stack adds to the weight.
Do they assume that a lengthier callstack means the program has probably already forked to its maximum?
Let’s review the code to find out
lib/Core/Searcher.cpp:176
1 | double WeightedRandomSearcher::getWeight(ExecutionState *es) { |
Seems like it is taking a StackFrame
structure, stored at the start of the stack, and retrieving the total number of instructions executed since the start of the new StackFrame(current function) from the callPathNode
member.
The calculated weight is the inverse of this number.
This matches our hypothesis, as the path should be less likely to discover new code the more instructions it has already executed.
It’s pretty logical that the total instructions executed is weighted less than the number of instructions executed in the current stack frame. However, I do not understand the choice behind giving depth
such a high weightage. Shouldn’t deeper code be less likely to cover new paths?
KLEE ensures that a state which frequently executes expensive instructions will not dominate execution time by running each state for a “time slice” defined by both a maximum number of instructions and a maximum amount of time.
Environment Modelling
Now time to discuss environment modelling, the greatest obstacle of modern symbolic executors.
Keeping up with the symbolic-ness, operations that deal with environment variables, files or network packets should be symbolic too.
We conceptually want to return all values that the read could legally produce, rather than just a single concrete value. When it writes to its environment, the effects of these alterations should be reflected in subsequent reads.
KLEE achieves this by using C code to model around 40 system calls, such as open
, read
, write
, stat
, lseek
.
This is customizable and extendable by the user without understanding KLEE’s internals. Amazing!
Filesystem syscalls that deal with symbolic values interact with a symbolic filesystem, where the user of KLEE can specify the amount of files in it and their sizes.
Example modeling the open syscall:
the call
1 | int fd = open("/etc/fstab", O_RDNLY); |
to a concrete value will be delegated to the actual open syscall
the call
1 | int fd = open(argv[1], O_RDNLY); |
will instead result in two paths:
one in which fd
points to a single symbolic file in the environment, and one in which fd
is set to -1 indicating an error.
In the process it’s possible to configure(optional) KLEE to simulate environmental failures, such as a write()
fail due to full disk.
KLEE chooses to model syscalls instead of standard library functions because the huge amount of standard functions present will make modelling tedious and error prone.
At time of writing(2008) KLEE was able to achieve over 90% average line coverage across GNU CoreUtils.
0.0 impressive!
Conclusion
The dive into KLEE internals is extremely interesting, and it shows how researchers are able to bridge theoretical advances into production level tools.
Our previous doubt of how symbolic executors are able to find bugs is also cleared by reading this paper(https://llvm.org/pubs/2008-12-OSDI-KLEE.pdf), and we now know that KLEE rewrites operations to perform error checking.
With new bug types emerging(such as type confusion which is not as prevalent in 2008), can KLEE be easily extended to include such checks? I don’t think so.
However, it’s assuring that users can easily customize the modelled syscalls without reading through the ten thousand lines of cpp.
The results seem very promising even for vanilla usage, and I can’t wait to play with it in the next post :)