A Review of Fuzzing Tools and Methods - James Fell
Introduction
My goal for 2023 currently is to take a deep dive into fuzzing and Windows Kernel stuff.
I figured that it’ll be helpful if I document the papers I read, and provide a summary to help myself(hopefully others too) to learn.
I’ll begin with the paper A Review of Fuzzing Tools and Methods
by James Fell
, Originally published in PenTest Magazine
on 10th March 2017.
This should be a great starting point to learn about the different types of fuzzing, so I can dive into more specific papers in the future.
You can find the full paper here:
https://wcventure.github.io/FuzzingPaper/Paper/2017_review.pdf
Anything in italics are my own opinions, and not reflective of the author’s idea presented in the paper.
Beginning
The first portion of the paper was dedicated to talk about the current landscape of vulnerabilities.
A brief review of software vulnerabilities were presented, including:
- Stack based buffer overflows
- Heap based buffer overflows
- Integer overflows
- Format string bugs
- Off-by-one vulnerabilities
- Double free bugs
- Use-after-free bugs
Subtle overflows that do not cause a crash upfront(such as an off by one heap overflow) can be identified by compiling a program with the -fsanitize=address
option on the gcc compiler to activate AddressSanitizer(ASAN
).
On Windows, I believe enabling full page heap with gflags /p /enable app.exe /full
will allow for upfront detection of heap overflow bugs. However, this has the side effect of a potential memory shortage due to every heap chunk taking up a page.
Upon finding a crash, the exploitable
plugin, available on gdb
and windbg
can provide a hint on the exploitability of the bug to gain code execution.
Real life examples of each type of vulnerabilities were also mentioned:
- UAF : CVE-2014-0322
- Double-Free : CVE-2015-0058
- Integer Overflow : CVE-2010-2753
- Off-By-One : CVE-2012-5144
Finally, the author brought up some hindrance towards software bugs. They include advancements in compilers that produce warnings to deter the usage of dangerous functions such as gets
, the proliferation of higher level languages like Python
that performs automatic bounds checking, as well as the development of frameworks for coding best practises, for example the Microsoft Security Development Lifecycle(SDL).
Static Analysis
Before talking about fuzzing, a form of dynamic analysis, the author briefly examined static analysis and its workings. In fact, fuzzing incorporates techniques from static analysis to improve results as well.
Pros of Static Analysis:
- Theoretically complete code coverage
Cons of Static Analysis:
- Certain code paths that contains a bug might not be reachable from the external user, thus low exploitability.
- Large amounts of manual analysis required.
- Many false positives.
- Difficult to understand certain bugs without a runtime environment.
- Vulnerabilities in libraries that are imported will be missed if these libraries are not also analysed
It seems like static analysis is theoretically a better approach to finding bugs, since the cons are mostly manpower limitations. For example, with enough enumeration of scenarios, it is possible to emulate all the possible runtime environments in static analysis(chatGPT provides a glimpse of hope). In the real world however, it is unfeasible to conduct vulnerability research on a huge code base using pure static analysis as of current capabilities.
Some popular static analysis tools include Cppcheck
and FlawFinder
for C/C++, RIPS
for PHP and FindBugs
for Java. Certain tools can be included into common IDEs, to incorporate static analysis into the development lifecycle.
How do static analysis tools work?
On the simple end, some tools just work by searching for specific potentially dangerous function names/signatures. Upon a match, the tool will feedback to the developer, with an explanation on the possible bugs that can occur.
This reminds me of common web vulnerability scanners. These are usually only capable of finding low hanging fruits, and produces huge amounts of false positives. As a vulnerability researcher, this should not return much useful findings for a moderately well maintained project, since the developers will probably have ran the analysis on their end already.
More sophisticated techniques include taint analysis
. This works by marking every data from a “source”(user controlled input) as tainted. Any data derived from a tainted data is also tainted. The flow of tainted data is traced statically until a “sink”(potentially dangerous function) is reached. Developers should check for proper sanitisation of tainted data that can reach a sink.
Although more advanced, this only provides a partial solution to one con by ensuring all potential bugs are reachable by an external user. I say partial because the tainted data in the deeper layers might not be freely controllable from the surface.
Dynamic Analysis
Dynamic analysis includes any form of analysis that examines the application at runtime. It should be seen as complimentary to static analysis, because both types of analysis have their drawbacks.
Dynamic analysis techniques include fuzzing and fault injection. The latter works by inducing faulty states in the product at runtime, with the aim of measuring how well the product reacts to corrupted inputs. This sounds really similar to fuzzing, but was originally aimed more at quality insurance.
Pros:
- Every crash is sure to have a ready to use POC already generated.
- Mostly automated, manual work only required during setup and triage.
Cons:
- Might not reach a good amount of code coverage.
- Long time required to conduct a test.
- Only a finite set of inputs can be tested.
The development of fuzzers are geared towards solving the first two cons and reducing work required for setup and triage, as you will read later on.
Fuzzing
In the initial stages of fuzzing, around 20 years ago, fuzzers were predominantly “blind”. The most rudimentary fuzzers work by throwing random input, that at best followed a certain user defined structure, at a program and then logging inputs that resulted in a crash. An example will be the SPIKE
framework.
As the art of fuzzing evolved, the concept of code coverage
became more and more important.
The fuzz will be extremely ineffective if only a fixed, small part of the code is being executed to deal with the test case. It usually reflects a problem with either your test case or harness, such that it is stuck within a particular loop and unable to reach other parts of the code.
In order for the fuzzer to receive feedback, instrumentation is required. Instrumentation can be added at compile time or runtime. Runtime instrumentation is done with dynamic analysis tools such as DynamoRIO
, Valgrind
or PaiMei
. Compile time instrumentation include ASAN
mentioned previously, that helps to turn subtle bugs into crashes for the fuzzer to detect.
Some fuzzers like AFL
has builtin instrumentation, that allows the fuzzer to receive precise feedback regarding the input it just sent. AFL
will attempt to create new test cases with the help of these feedback. This is an example of a evolutionary fuzzer
.
Finally, it was mentioned that all kinds of instrumentation slows the target down, so it is important to strike a balance between speed of execution and feedback received.
Types of fuzzers
Just now I mentioned evolutionary fuzzers
.
There are some more types of modern fuzzers.
Mutational Fuzzers
A mutational fuzzer works by mutating parts of the test case in an attempt to gain coverage.
Given an initial test case, it will be byte flipped, have its sections deleted, copied, spliced and so on.
A good initial corpus should be a valid input to the program, so the fuzzer does not have to “fuzz out the format” to even start getting coverage.
An example of a mutational fuzzer is Zulu
from NCC Group
.
Pros:
- Easy to set up
- Can be used against multiple programs, even close sourced
Cons:
- Quite dumb, lots of test cases might not conform to the program’s specifications and be discarded straightaway.(Such as programs that demand a valid checksum)
- Heavily dependent on the initial corpus to get it started. With a bad corpus, it will take an extremely long time(or never) to figure out the format that the program accepts.
One way to deal with the first con is to comment/patch out the part of the code that checks for file format so our test cases can pass.
That however might produce crash cases that are not triggerable in a fresh copy. A better way will be to write a harness that patches the correct file format based on the mutated data.
Generation Fuzzer
Generation fuzzing works by following a specification. For example, in order to fuzz an ELF parser, you can provide the ELF specification to the fuzzer so that it generates slightly incorrect ELF files that still conform to a certain structure, so the parser will not instantly reject it for reasons like not having a proper header.
An example of a generation fuzzer is Sulley
.
Pros:
- Focused on fuzzing the fuzzable parts.
- Grammar and specifications help the fuzzer quickly understand the format that the program demands.
Cons:
- Might be difficult to obtain specifications for close sourced program, lots of reversing to be done.
- One set of rules against one program, have to spend the time to setup the fuzzer again with new specifications against another program.
Certain file formats(like PNG) support specifications that are not in common use, and thus cannot be found in common PNGs downloaded from the web.
Ad verbum:
1
In this situation the experiments found that generating test cases resulted in 76% more code coverage than mutation of PNG files downloaded from the Internet.
Evolutionary Fuzzer
Evolutionary Fuzzer creates new test cases based on feedback from the instrumentation in an attempt to work towards a certain goal. It can be further split into coverage-based
, taint-based
and symbolic-assisted
.
It is much smarter than generation and mutational, because it actually thinks about the input to provide instead of randomly flipping bytes. This is also a fuzzer where AI can come into assitance.
Although it is possible to fuzz any input that the program can consume, most fuzzers are generally either fuzzing file formats or protocols.
File format fuzzers target applications that receive a file as input, such as image viewers, movie players, PDF readers, word processors and audio file players.
Protocol fuzzers target communication protocols between programs over the internet. Targets include mail servers, webservers, FTP clients and many more.
Code Coverage
Research has shown that the number of vulnerabilities found is proportional to the amount of code coverage the fuzzer has obtained.
There are three common approaches to measuring code coverage:
- Line coverage
- Branch coverage
- Path coverage
Line coverage is the most basic form of code coverage. It measures the amount of source lines executed.
Quoting from the paper verbatim
1 | The shortcoming is that for example a conditional statement can be marked as executed as soon as the condition is tested, whether or not it evaluated to true, if it is all on one line. To give a pseudo code example, take the following line. |
Branch coverage is the measure of the conditional branches that have been executed. Taking the line above as example, there will be two branches to test, whether y=8
or z=9
.
Path coverage is even more detailed. Each unique sequence of lines and branches taken is a path.
Ad verbum:
1 | In general, a program with n reachable branches will require 2n test cases for branch coverage and 2^n test cases for path coverage |
Tools like gcov
and lcov
can add code coverage instrumentation at compile time.
Coverage-Based Evolutionary Fuzzer
Now that we understand code coverage, we can discuss coverage-based evolutionary fuzzers.
These fuzzers have a goal of reaching greater coverage, and favour test cases that result in new code paths. AFL
is a powerful coverage-based evolutionary fuzzer. It has an algorithm to evolve the group of test cases that covered the maximum amount of code. In a couple of days, it is capable of producing valid JPEG
files, starting from a simple hello
in a text file.
Taint-Based Evolutionary Fuzzer
These fuzzers have a goal of not maximum code coverage, but reaching potentially dangerous functions.
For example, a prototype created by Lanzi et al is capable of taint-analysing the program statically to find paths of code that lead to a potentially dangerous function, use this information to guide the fuzzer to evolve inputs that exercise these specific code paths.
The taint analysis can be performed statically or dynamically.
This is quite an effective way of fuzzing when time is limited and you just want to drive test cases to execute those specific functions
Symbolic-Assisted Evolutionary Fuzzer
These fuzzers combine symbolic execution
with fuzzing.
Its goal is to increase code coverage as well.
Unlike code-coverage fuzzers that uses some kind of trial and error to find test cases that trigger a larger code coverage, symbolic-assisted fuzzers use dynamic symbolic execution engines that apply changes that are known to result in the execution of a specific code segment(that was previously uncovered).
Ad verbum:
1 | The target is executed in an emulated environment and constraints on |
An example is the Scalable Automated Guided Execution (SAGE
) fuzzer, that Microsoft uses internally.
I have no idea what symbolic execution does, but it seems magical. That shall be the topic of the next paper to read!
Conclusion
The paper provides an overview of static and dynamic analysis methods available as of 2017.
Specific to dynamic analysis, it also talks about the different types of fuzzers, their pros and cons, as well as a high level overview of how they function.
Afterwards, it discusses fuzzing web browsers, which is not my field of research currently so I skipped it.
From a reader’s point of view, it introduces many fuzzing nouns in an easy-to-digest manner, and provides the corpus for me to research and peruse other papers that discusses these topics in more details.
The next paper to read should be on symbolic execution :)