Fuzzing 5

2023-09-23

Code review of the Jackalope fuzzer

Introduction

After the past month of mostly looking at CVEs, it’s time to go back to basics.

This post will be an analysis and code review of the fuzzing tool - Jackalope, designed by ProjectZero as a customizable and scalable blackbox fuzzer.

Let’s be honest, WinAFL and other AFL forks are quite nasty to read and modify.
Jackalope written in C++ offers a much cleaner codebase and greater extensibility.

Another reason is because WinAFL uses Dynamorio, which is akin to a heavy rocket launcher for program instrumentation.
For fuzzing that’s kind of an overkill, so Jackalope is shipped with ProjectZero’s own lightweight instrumentation library, TinyInst.

In this post we’ll go through the core components of a modern mutation based coverage guided fuzzer and how Jackalope implements them:

Architecture and Extensibility
Mutation Engine
Delivery and Persistence
Coverage and Detection

Architecture and Extensibility

In the main function, a Fuzzer object is expected to be instantiated with its Run() method invoked.

1
2
3

Fuzzer* fuzzer;
fuzzer = new CustomFuzzer(grammar);
fuzzer->Run(argc, argv);

CustomFuzzer is an arbitrary user defined child class of Fuzzer.

class CustomFuzzer : public Fuzzer {
  Mutator *CreateMutator(int argc, char **argv, ThreadContext *tc) override;
};

Mutator *CustomFuzzer::CreateMutator(int argc, char **argv, ThreadContext *tc)
{
}

Fuzzer exposes several virtual methods for the child to override, such as CreateMutator() which allows custom mutator configuration.
As a generic fuzzer we can instantiate a probabilistic mutator that allows specifying the probability at which each specific mutator is triggered:

PSelectMutator *pselect = new PSelectMutator();

// select one of the mutators below with corresponding
// probablilities
pselect->AddMutator(new ByteFlipMutator(), 0.8);
pselect->AddMutator(new ArithmeticMutator(), 0.2);
pselect->AddMutator(new AppendMutator(1, 128), 0.2);
pselect->AddMutator(new BlockInsertMutator(1, 128), 0.1);
pselect->AddMutator(new BlockFlipMutator(2, 16), 0.1);
pselect->AddMutator(new BlockFlipMutator(16, 64), 0.1);
pselect->AddMutator(new BlockFlipMutator(1, 64, true), 0.1);
pselect->AddMutator(new BlockDuplicateMutator(1, 128, 1, 8), 0.1);

return pselect;

The above PSelectMutator is in fact a child class of Mutator, which we can again override to enable custom mutation.

For example in order to achieve j00ru’s idea of having a dedicated mutation probability for each section of a file, we can easily override CreateMutator() to create a PSelectMutator for each section, and override Mutate() to fragmentize each sample into the relevant sections before delegating the call to the respective PSelectMutator->Mutate().

Similar principal is applied to other components like Sample Delivery and Instrumentation(yes most instrumentation methods are virtual so you can extend to use custom instrumentation).

class Instrumentation {
public:
  virtual ~Instrumentation() { }

  virtual void Init(int argc, char **argv) = 0;
  virtual RunResult Run(int argc, char **argv, uint32_t init_timeout, uint32_t timeout) = 0;

  virtual RunResult RunWithCrashAnalysis(int argc, char** argv, uint32_t init_timeout, uint32_t timeout) {
    return Run(argc, argv, init_timeout, timeout);
  }

  virtual void CleanTarget() = 0;

  virtual bool HasNewCoverage() = 0;
  virtual void GetCoverage(Coverage &coverage, bool clear_coverage) = 0;
  virtual void ClearCoverage() = 0;
  virtual void IgnoreCoverage(Coverage &coverage) = 0;

  virtual std::string GetCrashName() { return "crash"; };

  virtual uint64_t GetReturnValue() { return 0; }

  std::string AnonymizeAddress(void* addr);
};

Jackalope supports two modes: server/client mode for synchronisation across machines, as well as local multithreaded mode.
For the purpose of this article I’ll focus on the latter.

The main fuzzer thread starts n number of worker threads:

for (int i = 1; i <= num_threads; i++) {
  ThreadContext *tc = CreateThreadContext(argc, argv, i);
  CreateThread(StartFuzzThread, tc);
}

Each worker thread waits for a job, which can either be to process a sample or run a sample:

void Fuzzer::RunFuzzerThread(ThreadContext *tc) {
  while (1) {
    FuzzerJob job;

    SynchronizeAndGetJob(tc, &job);

    switch (job.type) {
    case WAIT:
#if defined(WIN32) || defined(_WIN32) || defined(__WIN32)
      Sleep(1000);
#else
      usleep(1000000);
#endif
      break;
    case PROCESS_SAMPLE:
      ProcessSample(tc, &job);
      break;
    case FUZZ:
      FuzzJob(tc, &job);
      break;
    default:
      FATAL("Unknown job type");
      break;
    }

    JobDone(&job);
  }
}

ProcessSample() attempts to run the input corpus once to see if it behaves badly without mutation.
Doesn’t seem too useful.

FuzzJob() tries to mutate each testcase fully and run it through the target.

while (1) {
  Sample mutated_sample = *entry->sample;
  if (!tc->mutator->Mutate(&mutated_sample, tc->prng, tc->all_samples_local)) break;
  if (mutated_sample.size > Sample::max_size) {
    continue;
  }

  int has_new_coverage;
  RunResult result = RunSample(tc, &mutated_sample, &has_new_coverage, true, true, init_timeout, timeout, entry->sample);
}

It will also update the stats table that’s be displayed by the main thread.

Details regarding the run is covered below under Delivery and Persistence.

Mutation Engine

Jackalope comes with a probabilistic mutator and several common mutators like byte flip, block duplicate, append and splice.
Interestingly it doesn’t come with a bit flip mutator, maybe that’s less worthy to deploy empirically.

PSelectMutator has an interesting algorithm:

double p = prng->RandReal() * psum;
double sum = 0;
for (int i = 0; i < child_mutators.size(); i++) {
  sum += probabilities[i];
  if ((p < sum) || (i == (child_mutators.size() - 1))) {
    last_mutator_index = i;
    Mutator *current_mutator = child_mutators[i];
    return current_mutator->Mutate(inout_sample, prng, all_samples);
  }
}

It converts each user specified weight into an amalgamated absolute percentage, then generates a random percentage to compare with.

Apart from this there’s nothing much to talk about.
Mutation is meant to be customized anyways.

Delivery and Persistence

The functionality required to run the target process with a testcase is fully implemented by TinyInst, hence the fuzzer just needs to call the API:

Fuzzer::FuzzJob()

1	RunResult result = RunSample(tc, &mutated_sample, &has_new_coverage, true, true, init_timeout, timeout, entry->sample);

Fuzzer::RunSample()

1	RunResult result = RunSampleAndGetCoverage(tc, sample, &initialCoverage, init_timeout, timeout);

Fuzzer::RunSampleAndGetCoverage()

RunResult result = tc->instrumentation->Run(tc->target_argc, tc->target_argv, init_timeout, timeout);
tc->instrumentation->GetCoverage(*coverage, true);

// save crashes and hangs immediately when they are detected
if (result == CRASH) {
  string crash_desc = tc->instrumentation->GetCrashName();

The above functions are just delegating the call to the lowest layer Run() method implemented by each instrumentation engine.

TinyInstInstrumentation::Run()

RunResult TinyInstInstrumentation::Run(int argc, char **argv, uint32_t init_timeout, uint32_t timeout) {
  DebuggerStatus status;
  RunResult ret = OTHER_ERROR;

  if (instrumentation->IsTargetFunctionDefined()) {
    if (cur_iteration == num_iterations) {
      instrumentation->Kill();
      cur_iteration = 0;
    }
  }
  
  // else clear only when the target function is reached
  if (!instrumentation->IsTargetFunctionDefined()) {
    instrumentation->ClearCoverage();
  }

  uint32_t timeout1 = timeout;
  if (instrumentation->IsTargetFunctionDefined()) {
    timeout1 = init_timeout;
  }

  if (instrumentation->IsTargetAlive() && persist) {
    status = instrumentation->Continue(timeout1);
  } else {
    instrumentation->Kill();
    cur_iteration = 0;
    status = instrumentation->Run(argc, argv, timeout1);
  }

  // if target function is defined,
  // we should wait until it is hit
  if (instrumentation->IsTargetFunctionDefined()) {
    if (status != DEBUGGER_TARGET_START) {
      // try again with a clean process
      WARN("Target function not reached, retrying with a clean process\n");
      instrumentation->Kill();
      cur_iteration = 0;
      status = instrumentation->Run(argc, argv, init_timeout);
    }

    if (status != DEBUGGER_TARGET_START) {
      switch (status) {
      case DEBUGGER_CRASHED:
        FATAL("Process crashed before reaching the target method\n");
        break;
      case DEBUGGER_HANGED:
        FATAL("Process hanged before reaching the target method\n");
        break;
      case DEBUGGER_PROCESS_EXIT:
        FATAL("Process exited before reaching the target method\n");
        break;
      default:
        FATAL("An unknown problem occured before reaching the target method\n");
        break;
      }
    }

    instrumentation->ClearCoverage();

    status = instrumentation->Continue(timeout);
  }

  switch (status) {
  case DEBUGGER_CRASHED:
    ret = CRASH;
    instrumentation->Kill();
    break;
  case DEBUGGER_HANGED:
    ret = HANG;
    instrumentation->Kill();
    break;
  case DEBUGGER_PROCESS_EXIT:
    ret = OK;
    if (instrumentation->IsTargetFunctionDefined()) {
      WARN("Process exit during target function\n");
      ret = HANG;
    }
    break;
  case DEBUGGER_TARGET_END:
    if (instrumentation->IsTargetFunctionDefined()) {
      ret = OK;
      cur_iteration++;
    } else {
      FATAL("Unexpected status received from the debugger\n");
    }
    break;
  default:
    FATAL("Unexpected status received from the debugger\n");
    break;
  }

  return ret;
}

This is a manager to call into the actual Run() function residing in TinyInst.
In persistent mode, it first checks if the process needs to be restarted because the maximum number of iterations has been hit.
Otherwise, it tries to continue the process if it is already running, or start it if it isn’t.

This works because TinyInst adds breakpoints to the target function in persistent mode.

// called when a module gets loaded
void Debugger::OnModuleLoaded(void *module, char *module_name) {
  // printf("In on_module_loaded, name: %s, base: %p\n", module_name, module_info.lpBaseOfDll);

  if (target_function_defined && _stricmp(module_name, target_module.c_str()) == 0) {
    target_address = GetTargetAddress((HMODULE)module);
    if (!target_address) {
      FATAL("Error determining target method address\n");
    }

    AddBreakpoint(target_address, BREAKPOINT_TARGET);
  }
}

Upon hitting the target function, it saves the arguments and overwrites the return address to generate an exception when the function ends.

// called when the target method is reached
void Debugger::HandleTargetReachedInternal() {
  // printf("in OnTargetMethod\n");

  SIZE_T numrw = 0;

  saved_sp = (void *)GetRegister(RSP);

  saved_return_address = 0;
  ReadProcessMemory(child_handle, saved_sp, &saved_return_address, child_ptr_size, &numrw);

  if (loop_mode) {
    GetFunctionArguments(saved_args, target_num_args, (uint64_t)saved_sp, calling_convention);

    // todo store any target-specific additional context here
  }

  // modify the return address on the stack so that an exception is triggered
  // when the target function finishes executing
  // another option would be to allocate a block of executable memory
  // and point return address over there, but this is quicker
  size_t return_address = PERSIST_END_EXCEPTION;
  WriteProcessMemory(child_handle, saved_sp, &return_address, child_ptr_size, &numrw);
}

This exception will be caught, notifying TinyInst that the target function has ended.

case EXCEPTION_ACCESS_VIOLATION: {
  if (target_function_defined && 
     ((size_t)exception_record->ExceptionAddress == PERSIST_END_EXCEPTION))
  {
    if (trace_debug_events) printf("Debugger: Persistence method ended\n");
    HandleTargetEnded();
    return DEBUGGER_TARGET_END;
  }

In persistent(loop) mode, it restores the arguments and sets the instruction pointer back to the start of target function

// called every time the target method returns
void Debugger::HandleTargetEnded() {
  // printf("in OnTargetMethodEnded\n");

  target_return_value = GetRegister(RAX);

  if (loop_mode) {
    // restore params

    // Writing to lcContext directly to avoid calling 
    // SetThreadContext multiple times.
    // We don't need to RetrieveThreadContext() as it was done in 
    // GetRegister() above and we don't need to SetThreadContext
    // as it will be called by SetFunctionArguments below
#ifdef _WIN64
    lcContext.Rip = (size_t)target_address;
    lcContext.Rsp = (size_t)saved_sp;
#else
    lcContext.Eip = (size_t)target_address;
    lcContext.Esp = (size_t)saved_sp;
#endif

    // restore return address as it might have been overwritten by instrumentation
    SIZE_T numrw = 0;
    size_t return_address = PERSIST_END_EXCEPTION;
    WriteProcessMemory(child_handle, saved_sp, &return_address, child_ptr_size, &numrw);

    SetFunctionArguments(saved_args, target_num_args, (uint64_t)saved_sp, calling_convention);

    // todo restore any target-specific additional context here

  } else { /*  loop_mode == false */

    SetRegister(RIP, (size_t)saved_return_address);

    // restore target entry breakpoint
    // note that this time, the breakpoint address might be
    // in instrumented code
    // so we need to use translated address
    AddBreakpoint((void *)GetTranslatedAddress((size_t)target_address),
                  BREAKPOINT_TARGET);
  }
}

This explains why the run manager in Jackalope can repeatedly attempt to Continue() the target.

Coverage and Detection

Coverage data is also kindly provided by TinyInst APIs.

On initialization, TinyInst marks all code segments of the instrumented module as unexecutable and copies the code elsewhere in memory.
Therefore whenever code in the module is exercised, an exception is raised and TinyInst catches that.
TinyInst then tries to instrument all basic blocks reachable via direct jumps from the initially exercised code by modifying the copied code recursively, then redirecting control flow to the copied code.
This is to prevent further exceptions in the current code path, since exception handling is expensive.

Edge Coverage

Edge instrumentation is achieved by adding a mov statement that sets an index in the coverage bytemap.

First a coverage code unique for each edge is generated:

uint64_t LiteCov::GetEdgeCode(ModuleInfo *module, size_t edge_address1,
                              size_t edge_address2) {
  uint64_t offset1 = 0;
  if (edge_address1)
    offset1 = ((uint64_t)edge_address1 - (uint64_t)module->min_address);
  uint64_t offset2 = 0;
  if (edge_address2)
    offset2 = ((uint64_t)edge_address2 - (uint64_t)module->min_address);

  return ((offset1 << 32) + (offset2 & 0xFFFFFFFF));
}

The hash table buf_to_coverage is used to map the bytemap index to coverage code:

data->buf_to_coverage[data->coverage_buffer_next] = coverage_code;

...

size_t bit_address =
    (size_t)data->coverage_buffer_remote + data->coverage_buffer_next;
size_t mov_address = GetCurrentInstrumentedAddress(module);
data->coverage_buffer_next++;

EmitCoverageInstrumentation(module, bit_address, mov_address);

Finally the assembly instruction is written to the copied code.

void LiteCov::EmitCoverageInstrumentation(ModuleInfo *module,
                                          size_t bit_address,
                                          size_t mov_address) {
  //////////////////////////////////////////////////
  // mov [coverage_buffer + coverage_buffer_next], 1
  //////////////////////////////////////////////////
  WriteCode(module, MOV_ADDR_1, sizeof(MOV_ADDR_1));
  mov_address += sizeof(MOV_ADDR_1);

When the bytemap is retrieved, it’s decoded using the buf_to_coverage table, and the resulting coverage codes are written to disk as coverage.

On Jackalope’s end it simply calls the API to run/get coverage:

1 2	RunResult result = tc->instrumentation->Run(tc->target_argc, tc->target_argv, init_timeout, timeout); tc->instrumentation->GetCoverage(*coverage, true);

If the result is a crash or hang, it’s logged immediately to disk.
Otherwise, a check is performed to see if new coverage was generated.

1 2	CoverageDifference(tc->thread_coverage, initialCoverage, new_thread_coverage); if(new_thread_coverage.empty()) return result;

Mutated samples that result in new coverage is saved to disk and added to a priority queue, prioritizing samples that have most recently contributed to new coverage.

1	std::priority_queue<SampleQueueEntry , std::vector<SampleQueueEntry >, CmpEntryPtrs> sample_queue;

Conclusion

A mutation based coverage guided fuzzer is really straightforward.
All you need is an instrumentation framework, a mutator and a detector.

The instrumentation approach can either be static(compile time like afl-as or binary level like pe-afl) or dynamic(Dynamorio, PIN, TinyInst), and it will be the most challenging to implement.

The remaining is just a dumb fuzzer.

If you want to implement a fuzzer from scratch, start with a dumb one first!