// Copyright (C) 2017-2024 Internet Systems Consortium, Inc. ("ISC") // // This Source Code Form is subject to the terms of the Mozilla Public // License, v. 2.0. If a copy of the MPL was not distributed with this // file, You can obtain one at http://mozilla.org/MPL/2.0/. /** @page fuzzer Fuzzing Kea @section fuzzIntro Introduction Fuzzing is a software-testing technique whereby a program is presented with a variety of generated data as input and is monitored for abnormal conditions such as crashes or hangs. There are two ways to fuzz Kea. Option 1. With the libfuzzer harness function LLVMFuzzerTestOneInput. Option 2. With the AFL (American Fuzzy Lop) compiler. @section LLVMFuzzerTestOneInput Using the LLVMFuzzerTestOneInput Harness Function This mode of fuzzing works with virtually any compiler. There are four types of fuzzers implemented with this mode: - Config fuzzer - HTTP endpoint fuzzer - Packet fuzzer - Unix socket fuzzer There are two binaries under test: - `kea-dhcp4` - `kea-dhcp6` Combining the binaries and the fuzzer types results in eight fuzzing binaries: - `fuzz/fuzz_config_kea_dhcp4` - `fuzz/fuzz_config_kea_dhcp6` - `fuzz/fuzz_http_endpoint_kea_dhcp4` - `fuzz/fuzz_http_endpoint_kea_dhcp6` - `fuzz/fuzz_packets_kea_dhcp4` - `fuzz/fuzz_packets_kea_dhcp6` - `fuzz/fuzz_unix_socket_kea_dhcp4` - `fuzz/fuzz_unix_socket_kea_dhcp6` @subsection HowToBuild How to Build the LLVM Fuzzer Use the "--enable-fuzzing" during the configure step. Then just compile as usual. @code ./configure --enable-fuzzing make @endcode You can check that `config.report` shows these lines: @code Developer: [...] Fuzzing: yes AFL: no @endcode Compiling with AFL is permitted, but is not required. @subsection HowToRun How to Run the LLVM Fuzzer Each of these binaries has two ways to run. It tries to find a directory called `input/` relative to the binary where `` is the name of the binary. - The first mode: if the directory exists, it recursively takes all the files from that directory and provides them as fuzz input one-by-one. All the fuzzers have an empty file and a one-byte file as inputs committed to the repository. Config fuzzers also have all the files in `doc/examples/kea[46]` symlinked. - The second mode: if the directory doesn't exist, then it accepts input from stdin, just like the old fuzzer did. In this mode, a fuzzer engine can be run on it. This is the mode used in CI. After compiling, all the fuzzers can be run with `make check` in the `fuzz` directory. The reasoning behind this is that while writing code, developers can quickly check if anything is broken. Obviously, this is not real fuzzing as long since the input from the `fuzz/input` directory is the same, but it rather tests if the fuzzers were broken during development. `make check` runs these fuzzers with `sudo`. It may interrupt the process asking for a password on systems that don't have passwordless root set up. @subsection FuzzingStructure The Code Structure of the LLVM Fuzzer The following functions are required to be implemented in each new fuzzer: - `int LLVMFuzzerInitialize();` - Does initialization that is required by the fuzzing. Is only run once. Is run automatically. It does not need to be run explicitly by the fuzzing engine. - `int LLVMFuzzerTearDown();` - Cleans up the setup like removing leftover files. Is automatically run at the beginning and the end of the fuzzing. It does not need to be run explicitly by the fuzzing engine. - `int LLVMFuzzerTestOneInput(uint8_t const* data, size_t size);` - Implements the actual fuzzing. Takes the parameter input and achieves the object of the fuzzing with it. It needs to start with `static bool initialized(DoInitialization());` to do the initialization only once. This function is the only one that needs to be run explicitly by the fuzzing engine. The following functions are common to all fuzzers: - `int main(int, char* argv[])` - Implements the input searching mentioned above. - `bool DoInitialization();` - Sets up logging to prevent spurious logging and calls `int LLVMFuzzerInitialize();`. - `void writeToFile(std::string const& file, std::string const& content);` - A helpful function to write to file used in some fuzzers. @subsection FuzzingConsiderations Development Considerations About LLVM Fuzzing in Kea Exceptions make it difficult to maintain a fuzzer. We have to triage some of the exceptions. For example, JSONError is thrown when an invalid JSON is provided as input in config fuzzing. That leads to a core dump and can be interpreted as a crash by the fuzzing engine, which is not really what we're interested in because this exception is caught in the kea-dhcp[46] binaries. The old way of fuzzing may have been better from this point of view, because there was the guarantee that the right exceptions were caught and nothing more and we didn't have to pay attention to what exceptions needed to be ignored and which weren't. @section usingAFL Using AFL In this, Kea is built using an AFL-supplied program that not only compiles the software but also instruments it. When run, AFL generates test cases and monitors the execution of Kea as it processes them. AFL will adjust the input based on these measurements, seeking to discover and test new execution paths. @subsection fuzzTypeNetwork Fuzzing with Network Packets In this mode, AFL will start an instance of Kea and send it a packet of data. Kea reads this packet and processes it in the normal way. AFL monitors code paths taken by Kea and, based on this, will vary the data sent in subsequent packets. @subsection fuzzTypeConfig Fuzzing with Configuration Files Kea has a configuration file check mode whereby it will read a configuration file, report whether the file is valid, then immediately exit. Operation of the configuration parsing code can be tested with AFL by fuzzing the configuration file: AFL generates example configuration files based on a dictionary of valid keywords and runs Kea in configuration file check mode on them. As with network packet fuzzing, the behaviour of Kea is monitored and the content of subsequent files adjusted accordingly. @subsection fuzzBuild Building Kea for Fuzzing Whatever tests are done, Kea needs to be built with fuzzing in mind. The steps for this are: -# Install AFL on the system on which you plan to build Kea and do the fuzzing. AFL may be downloaded from http://lcamtuf.coredump.cx/afl. At the time of writing (August 2019), the latest version is 2.52b. AFL should be built as per the instructions in the README file in the distribution. The LLVM-based instrumentation programs should also be built, as per the instructions in the file llvm_mode/README.llvm (also in the distribution). Note that this requires that LLVM be installed on the machine used for the fuzzing. -# Build Kea. Kea should be compiled and built as usual, although the following additional steps should be observed: - Set the environment variable CXX to point to the afl-clang-fast compiler. - Specify a value of "--prefix" on the command line to set the directory into which Kea is installed. - Add the "--enable-fuzzing" switch to the "configure" command line. . For example: @code CXX=afl-clang-fast ./configure --enable-fuzzing --prefix=$HOME/installed make @endcode -# Install Kea to the directory specified by "--prefix": @code make install @endcode This step is not strictly necessary, but makes running AFL easier. "libtool", used by the Kea build procedure to build executable images, puts the executable in a hidden ".libs" subdirectory of the target directory and creates a shell script in the target directory for running it. The wrapper script handles the fact that the Kea libraries on which the executable depends are not installed by fixing up the LD_LIBRARY_PATH environment variable to point to them. It is possible to set the variable appropriately and use AFL to run the image from the ".libs" directory; in practice, it is a lot simpler to install the programs in the directories set by "--prefix" and run them from there. @subsection fuzzRunNetwork Fuzzing with Network Packets -# In this type of fuzzing, Kea is processing packets from the fuzzer over a network interface. This interface could be a physical interface or it could be the loopback interface. Either way, it needs to be configured with a suitable IPv4 or IPv6 address depending on whether kea-dhcp4 or kea-dhcp6 is being fuzzed. -# Once the interface has been decided, these need to be set in the configuration file used for the test. For example, to fuzz Kea-dhcp4 using the loopback interface "lo" and IPv4 address 10.53.0.1, the configuration file would contain the following snippet: @code { "Dhcp4": { "interfaces-config": { "interfaces": [ "lo/10.53.0.1" ] }, "subnet4": [ { "interface": "lo" } ] } } @endcode -# The specification of the interface and address in the configuration file is used by the main Kea code. Owing to the way that the fuzzing interface between Kea and AFL is implemented, the address and interface also need to be specified by the environment variables KEA_AFL_INTERFACE and KEA_AFL_ADDRESS. With a configuration file containing statements listed above, the relevant commands are: @code export KEA_AFL_INTERFACE="lo" export KEA_AFL_ADDRESS="10.53.0.1" @endcode (If kea-dhcp6 is being fuzzed, then KEA_AFL_ADDRESS should specify an IPv6 address.) -# The fuzzer can now be run: a suitable command line is: @code afl-fuzz -m 4096 -i seeds -o fuzz-out -- ./kea-dhcp6 -c kea.conf -p 9001 -P 9002 @endcode In the above: - It is assumed that the directory holding the "afl-fuzz" program is in the path, otherwise include the path name when invoking it. - "-m 4096" allows Kea to take up to 4096 MB of memory. (Use "ulimit" to check and optionally modify the amount of virtual memory that can be used.) - The "-i" switch specifies a directory (in this example, one named "seeds") holding "seed" files. These are binary files that AFL will use as its source for generating new packets. They can generated from a real packet stream with wireshark: right click on a packet, then export as binary data. Ensure that only the payload of the UDP packet is exported. - The "-o" switch specifies a directory (in this example called "fuzz-out") that AFL will use to hold packets it has generated and packets that it has found causing crashes or hangs. - "--" Separates the AFL command line from that of Kea. - "./kea-dhcp6" is the program being fuzzed. As mentioned above, this should be an executable image, and it will be simpler to fuzz one that has been installed. - The "-c" switch sets the configuration file Kea should use while being fuzzed. - "-p 9001 -P 9002". The port on which Kea should listen and the port to which it should send replies. If omitted, Kea will try to use the default DHCP ports, which are in the privileged range. Unless run with "sudo", Kea will fail to open the port and Kea will exit early on: no useful information will be obtained from the fuzzer. -# Check that the fuzzer is working. If run from a terminal (with a black background - AFL is particular about this), AFL will bring up a curses-style interface showing the progress of the fuzzing. A good indication that everything is working is to look at the "total paths" figure. Initially, this should increase reasonably rapidly. If not, it is likely that Kea is failing to start or initialize properly and the logging output (assuming this has been configured) should be examined. @subsection fuzzRunConfig Fuzzing with Configuration Files AFL can be used to check the parsing of the configuration files. In this type of fuzzing, AFL generates configuration files which it passes to Kea to check. Steps for this fuzzing are: -# Build Kea as described above. -# Create a dictionary of keywords. Although AFL will mutate the files by byte swaps, bit flips and the like, better results are obtained if it can create new files based on keywords that could appear in the file. The dictionary is described in the AFL documentation, but in brief, the file contains successive lines of the form 'variable=keyword"', e.g. @code PD_POOLS="pd-pools" PEERADDR="peeraddr" PERSIST="persist" PKT="pkt" PKT4="pkt4" @endcode "variable" can be anything, as its name is ignored by AFL. However, all the variable names in the file must be different. "keyword" is a valid keyword that could appear in the configuration file. The convention adopted in the example above seems to work well - variables have the same name as keywords, but are in uppercase and have hyphens replaced by underscores. -# Run Kea with a command line of the form: @code afl-fuzz -m 4096 -i seeds -o fuzz-out -x dict.dat -- ./kea-dhcp4 -t @@ @endcode In the above command line: - Everything up to and including the "--" is the AFL command. The switches are as described in the previous section apart from the "-x" switch: this specifies the dictionary file ("dict.dat" in this example) described above. - The Kea command line uses the "-t" switch to specify the configuration file to check. This is specified by two consecutive "@" signs: AFL will replace these with the name of a file it has created when starting Kea. @subsection fuzzInternalNetwork Fuzzing with Network Packets The AFL fuzzer delivers packets to Kea's stdin. Although the part of Kea concerning the reception of packets could have been modified to accept input from stdin and have Kea pick them up in the normal way, a less-intrusive method was adopted. The packet loop in the main server code for kea-dhcp4 and kea-dhcp6 is essentially: @code{.unparsed} while (not shutting down) { Read and process one packet } @endcode When --enable-fuzzing is specified, this is conceptually modified to: @code{.unparsed} while (not shutting down) { Read stdin and copy data to address/port on which Kea is listening Read and process one packet } @endcode Implementation is via an object of class "PacketFuzzer". When created, it identifies an interface, address and port on which Kea is listening and creates the appropriate address structures for these. The port is passed as an argument to the constructor because at the point at which the object is constructed, that information is readily available. The interface and address are picked up from the environment variables mentioned above. Consideration was given to extracting the interface and address information from the configuration file, but it was decided not to do this: -# The configuration file can contain the definition of multiple interfaces; if this is the case, the one being used for fuzzing is unclear. -# The code is much simpler if the data is extracted from environment variables. Every time through the loop, the object reads the data from stdin and writes it to the identified address/port. Control then returns to the main Kea code, which finds data available on the address/port on which it is listening and handles the data in the normal way. In practice, the "while" line is actually: @code{.unparsed} while (__AFL_LOOP(count)) { @endcode __AFL_LOOP is a token recognized and expanded by the AFL compiler (so no need to "#include" a file defining it) that implements the logic for the fuzzing. Each time through the loop (apart from the first), it raises a SIGSTOP signal telling AFL that the packet has been processed and instructing it to provide more data. The "count" value is the number of times through the loop before the loop terminates and the process is allowed to exit normally. When this happens, AFL will start the process anew. The purpose of periodically shutting down the process is to avoid issues raised by the fuzzing being confused with any issues associated with the process running for a long time (e.g. memory leaks). @subsection fuzzInternalConfig Fuzzing with Configuration Files No changes were required to Kea source code to fuzz configuration files. In fact, other than compiling with afl-clang++ and installing the resultant executable, no other steps are required. In particular, there is no need to use the "--enable-fuzzing" switch in the configuration command line (although doing so will not cause any problems). @subsection fuzzThreads Changes Required for Multi-Threaded Kea The early versions of the fuzzing code used a separate thread to receive the packets from AFL and to write them to the socket on which Kea is listening. The lack of synchronization proved a problem, with Kea hanging in some instances. Although some experiments with thread synchronization were successful, in the end the far simpler single-threaded implementation described above was adopted for the single-threaded Kea 1.6. Should Kea be modified to become multi-threaded, the fuzzing code will need to be changed back to reading the AFL input in the background. @subsection fuzzNotesUnitTests Unit Test Failures If unit tests are built when --enable-fuzzing is specified and with the AFL compiler, note that tests which check or use the DHCP servers (i.e. the unit tests in src/bin/dhcp4, src/bin/dhcp6 and src/bin/kea-admin) will fail. With no AFL-related environment variables defined, a C++ exception will be thrown with the description "no fuzzing interface has been set". However, if the `KEA_AFL_INTERFACE` and `KEA_AFL_ADDRESS` variables are set to valid values, the tests will hang. Both these results are expected and should cause no concern. The exception is thrown by the fuzzing object constructor when it attempts to create the address structures for routing packets between AFL and Kea but discovers it does not have the necessary information. The hang is due to the fact that the AFL processing loop does a synchronous read from stdin, something not expected by the test. (Should random input be supplied on stdin, e.g. from the keyboard, the test will most likely fail as the input is unlikely to be that expected by the test.) */