Symbolic Execution 2

2022-12-23

A Review of KLEE’s internal design

Introduction

Following my previous post on the overview of Symbolic Execution, let’s review a specific implementation of a SE engine, to see how this approach can be concretized in the real world to aid program analysis and bug hunting against real softwares.

KLEE is selected due to its prevalence and proven ability to audit production level software.

KLEE Overview

KLEE works on LLVM Intermediate Representation(IR), which is a high level machine code that the source compiles to.

KLEE directly interprets the LLVM IR to create symbolic processes known as states with registers, stack, heap, program counter, and path condition.

At any one time, there can be a large number of states waiting for KLEE to execute.

1	The core of KLEE is an interpreter loop, which selects a state to run and symbolically executes a single instruction in the context of that state.

This continues until no states are remaining, or a user-defined timeout is reached.

Instead of raw data values, expressions are stored on the symbolic register, stack or heap, in the form of a tree.

The leaves are symbolic values or constants, and the interior nodes are LLVM IR operations, such as add, sub.

Take the

1	%dst = add i32 %src0, %src1

instruction as example

KLEE retrieves the addends from the %src0 and %src1 registers and writes a new expression Add(%src0, %src1) to the %dst register.
For efficiency, the code that builds expressions checks if all given operands are concrete (i.e., constants) and, if so, performs the operation natively, returning a constant expression.

When a conditional branch is hit, KLEE invokes the constraint solver to check if the boolean constraint can be proven true or false, and eliminates infeasible paths(to prevent path explosion as we’ve discussed previously).

If both paths are feasible, KLEE clones the current state so that it can explore both paths, updating the instruction pointer and path condition on each path appropriately.

Bug Checking

How does KLEE check for bugs?

Potentially dangerous operations implicitly generate branches that check if any input value exists that could cause an error.

For example, a division instruction generates a branch that checks for a zero divisor. Such branches work identically to normal branches. Thus, even when the check succeeds (i.e., an error is detected), execution continues on the false path, which adds the negation of the check as a constraint (e.g., making the divisor not zero). If an error is detected, KLEE generates a test case to trigger the error and terminates the state.

In higher level terms, it means a

1	a = b / c;

instruction will become

if (c)
  a = b / c;
else
  CRASH;

As with other dangerous operations, load and store instructions generate checks, in this case to check that the address is in-bounds of a valid memory object.

1	a[j] = 999;

will become

if (0 <= j < sizeof(a))
  a[j] = 999;
else
  CRASH;

To facilitate the SMT solver, every memory object will be mapped as a SMT array object(i guess solved with the theory of arrays).

When a dereferenced pointer p can refer to N objects(struct in struct to emulate type polymorphism?), KLEE clones the current state N times. In each state it constrains p to be within bounds of its respective object and then performs the appropriate read or write operation.

Optimizations

KLEE’s optimizations include:

copy-on-write on object level instead of page level
sharing heap structures across multiple states, which can be cloned in constant time
expression rewriting

2x - x is simplified to x

constraint set simplification

For example, imagine an inexact constraint such as x < 10 gets added, followed some time later by a constraint x = 5. KLEE actively simplifies the constraint set by rewriting previous constraints when new equality constraints are added to the constraint set. In this example, substituting x = 5 into the first constraint simplifies it to TRUE, so KLEE eliminates the x < 10 constraint.

implied concretization

When a constraint such as x + 1 = 10 is added to the path condition, x is concretized to be 9 along that path, and it has to be 9. KLEE notes that and writes the constant to memory, such that future operations use cheaper constant expressions.

constraint independence

Constraints that don’t overlap in memory are independent, and when a query is initiated, only relevant constraints in the constraint set will be sent to the solver.

counter-example cache

Basically a cache that stores a constraint set, along with an instance of a concrete solution(or a special “no solution” sentinel if the constraint is not solvable) in a UBTree. Based on the cache, it uses 3 rules to reduce redundancy.

If a subset of a constraint set has no solutions, the original constraint set also will not
If the original constraint set is satisfiable, a subset of it is satisfiable with the same solution
If a subset of a constraint set has a solution, it is LIKELY that this is also a solution to the original set.

Because checking a potential solution is cheap, KLEE tries substituting in all solutions for subsets of the constraint set and returns a satisfying solution, if found.

This managed to reduce the number of queries KLEE makes to the SMT solver by a whopping 95%, and the efficacy increases with time.

State Selection

Now how does KLEE select a state to execute?

It uses the following 2 methods one by one:

Random path selection

KLEE maintains a binary tree with each interior node being the point of the program that branched, and each leaf node containing the state of each branch. States are selected by traversing this tree from the root and randomly selecting the path to follow at branch points.

This does not take into account the depth of each subtree, so it does not skew selection against shallower trees(which are more likely to explore new paths due to their relatively more free constraints).

It also avoids “starvation”, where a part of a program is rapidly creating new states in a tight loop and it just keeps expanding and taking up execution time.

Coverage-Optimized search

A weight for each state is computed, based on heuristics like the minimum distance to an uncovered instruction, the call stack, and whether the state recently discovered new code.

Not sure why the call stack adds to the weight.
Do they assume that a lengthier callstack means the program has probably already forked to its maximum?

Let’s review the code to find out

lib/Core/Searcher.cpp:176

double WeightedRandomSearcher::getWeight(ExecutionState *es) {
  switch(type) {
    default:
    case Depth:
      return es->depth;
    case RP:
      return std::pow(0.5, es->depth);
    case InstCount: {
      uint64_t count = theStatisticManager->getIndexedValue(stats::instructions,
                                                            es->pc->info->id);
      double inv = 1. / std::max((uint64_t) 1, count);
      return inv * inv;
    }
    case CPInstCount: {
      StackFrame &sf = es->stack.back();
      uint64_t count = sf.callPathNode->statistics.getValue(stats::instructions);
      double inv = 1. / std::max((uint64_t) 1, count);
      return inv;
    }
    case QueryCost:
      return (es->queryMetaData.queryCost.toSeconds() < .1)
                 ? 1.
                 : 1. / es->queryMetaData.queryCost.toSeconds();
    case CoveringNew:
    case MinDistToUncovered: {
      uint64_t md2u = computeMinDistToUncovered(es->pc,
                                                es->stack.back().minDistToUncoveredOnReturn);

      double invMD2U = 1. / (md2u ? md2u : 10000);
      if (type == CoveringNew) {
        double invCovNew = 0.;
        if (es->instsSinceCovNew)
          invCovNew = 1. / std::max(1, (int) es->instsSinceCovNew - 1000);
        return (invCovNew * invCovNew + invMD2U * invMD2U);
      } else {
        return invMD2U * invMD2U;
      }
    }
  }
}

Seems like it is taking a StackFrame structure, stored at the start of the stack, and retrieving the total number of instructions executed since the start of the new StackFrame(current function) from the callPathNode member.

The calculated weight is the inverse of this number.

This matches our hypothesis, as the path should be less likely to discover new code the more instructions it has already executed.

It’s pretty logical that the total instructions executed is weighted less than the number of instructions executed in the current stack frame. However, I do not understand the choice behind giving depth such a high weightage. Shouldn’t deeper code be less likely to cover new paths?

KLEE ensures that a state which frequently executes expensive instructions will not dominate execution time by running each state for a “time slice” defined by both a maximum number of instructions and a maximum amount of time.

Environment Modelling

Now time to discuss environment modelling, the greatest obstacle of modern symbolic executors.

Keeping up with the symbolic-ness, operations that deal with environment variables, files or network packets should be symbolic too.

We conceptually want to return all values that the read could legally produce, rather than just a single concrete value. When it writes to its environment, the effects of these alterations should be reflected in subsequent reads.

KLEE achieves this by using C code to model around 40 system calls, such as open, read, write, stat, lseek.

This is customizable and extendable by the user without understanding KLEE’s internals. Amazing!

Filesystem syscalls that deal with symbolic values interact with a symbolic filesystem, where the user of KLEE can specify the amount of files in it and their sizes.

Example modeling the open syscall:

the call

1	int fd = open("/etc/fstab", O_RDNLY);

to a concrete value will be delegated to the actual open syscall

the call

1	int fd = open(argv[1], O_RDNLY);

will instead result in two paths:

one in which fd points to a single symbolic file in the environment, and one in which fd is set to -1 indicating an error.

In the process it’s possible to configure(optional) KLEE to simulate environmental failures, such as a write() fail due to full disk.

KLEE chooses to model syscalls instead of standard library functions because the huge amount of standard functions present will make modelling tedious and error prone.

At time of writing(2008) KLEE was able to achieve over 90% average line coverage across GNU CoreUtils.

0.0 impressive!

Conclusion

The dive into KLEE internals is extremely interesting, and it shows how researchers are able to bridge theoretical advances into production level tools.

Our previous doubt of how symbolic executors are able to find bugs is also cleared by reading this paper(https://llvm.org/pubs/2008-12-OSDI-KLEE.pdf), and we now know that KLEE rewrites operations to perform error checking.

With new bug types emerging(such as type confusion which is not as prevalent in 2008), can KLEE be easily extended to include such checks? I don’t think so.

However, it’s assuring that users can easily customize the modelled syscalls without reading through the ten thousand lines of cpp.

The results seem very promising even for vanilla usage, and I can’t wait to play with it in the next post :)