Getting started with fuzzing your Django web app

10 Feb 2021

One of my clients recently started running some automated third party security tests against their long lived Django based web application, and inevitably the mass of random probing threw up a bunch of errors the site failed to cope with all of the wide range of unexpected inputs. No security issues turned up, but it did expose a bunch of failures to handle unexpected data in a predictable fashion, and that suggests room for improvement.

As humans, we developers tend to focus on the positive functional path when writing code, so when unexpected data turns up it can start to expose weak points in code where 'i's were not dotted, and 't's not crossed. Even with extensive unit testing, which this project has, those tests tend to focus on positive test data and a few obvious bits of bad data, so when genuinely unexpected data turns up it can start to show gaps.

Whilst it may seem as just an annoyance, malformed data being passed into a chunk of code can be a serious problem: most frequently this unexpected data usually this just leads to the code failing to carry out its desired function and returning an error, but it can also lead to security issues when you have unexpected data floating around your code, for example dictionaries with extra unexpected parameters on them, particular when that data makes it as far as third-party libraries the attacker may have had time to study better than you.

Whilst the best techniques to try combat this sort of thing are using strictly typed languages and solid code review, even then it is likely things will get through. And, as was the case here, if your app is built upon a popular language that doesn't have very strict type checking, what can you do to try and expose this sort of issue early?

One technique that you can use for this that's been common in the security industry for a while is fuzzing, and now thanks to a Google project Atheris, you can bring this to your python based code. So I decided to try and help my client with their large and elderly Django project to shake out input errors by automating fuzzing their code. What I’ve written up here is how I went about it, so as to hopefully encourage others to give this a go.

What is fuzzing?

Let’s start with a simple question: what is fuzzing, and how can it help us?

At its simplest, fuzzing is a structured way of firing random data into your program to see where it breaks. Fuzzing involves running your code in such a way that the fuzzer (a sort of test runner) can see what code paths were exercised by the data it passes to said code, and it'll note when changing some bit of the input causes another code path to be taken. By doing this again and again, over time, the fuzzer will begin to work out how your code responds to input and work out how to exercise your code by “understading” the inputs. This means that over time it'll do a much better job of exercising your system than just firing random data at it alone.

I remember one of the first examples of fuzzing someone showed me was running a fuzzer on an image library, and over time the fuzzer started generating valid images with no prior knowledge, just based on firing sample data into the code and seeing how it reacted!

So if you have a system where you want to check that your exercising all data paths, either because writing complete coverage in unit tests is not tractable or you rely on third-party libraries that you’re unsure about, the a fuzzer is an excellent way to do this.

Getting set up with Atheris

Now that we know what fuzzing is, how can I apply it to my client's Django app? Thankfully, some engineers at Google have already done a lot of the hard work for me: they have released Atheris, uses the clang fuzzing library to let you fuzz Python code. We just need to get that working with our Django project.

The first step is a devops one: we need to make sure we have an environment that we can run our Django app in and can host the clang fuzzing library. For the project I’m interested in testing it’s hosted using the official Python docker containers, so I’m going to do everything in docker containers, but docker isn’t necessary to all this, you just need to find a way to get the computer you’re running tests on to have a recent clang build (ideally version 12 or above) in addition to your Django set up.

Atheris would not work for me out of the box: the standard python docker images are based on the current stable Debian release Buster, and to install Atheris I needed to use a container image based off Debian Bullseye, the next release due which is currently unstable. This is due to the Clang packages rely on gcc 10, and Buster only supports up to gcc 8.

Thankfully, switching over from python:slim-buster to debian:bullseye is mostly just a matter of installing python alongside other apt packages. I ended up making a second Dockerfile for fuzzing that looks a little like this:

from debian:bullseye as clangbuilder

# Install some bits that let us add the clang package repository
RUN apt-get update -qqy && \
    apt-get install -qy \
        wget \
        gnupg

# Install the key and source location for the clang package repository
RUN wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add -
RUN echo 'deb http://apt.llvm.org/bullseye/ llvm-toolchain-bullseye main' >> /etc/apt/sources.list

# Install clang and python
RUN apt-get update -qqy && \
    apt-get install -qy \
        clang-format clang-tidy clang-tools clang clangd libc++-dev libc++1 libc++abi-dev libc++abi1 libclang-dev libclang1 liblldb-dev libllvm-ocaml-dev libomp-dev libomp5 lld lldb llvm-dev llvm-runtime llvm python-clang \
        pip \
        python-is-python3 \
        gcc \
    && rm -rf /var/lib/apt/lists/* \
    && rm -rf /var/cache/apt/*

...

RUN pip install requirements.txt
RUN pip install atheris

...

The gaps there remain the same as the Dockerfile for using python:slim-buster - use this as a started and then add at the end the usual steps you’d use for installing your Django app. A couple of notes:

The long list of packages for clang comes from their install shell script - there's possibly more than I need there, but I just went with what the default clang install script would have done.
I'm installing Atheris manually, not from requirements.txt, as that file has to work for our production image still based on python:slim-buster.
I had to install python-is-python3, which maps the "python3" binary to just "python". If your scripts for running the Django server are happy calling "python3" you can skip that.

Now we have that out the way, we can get on to actually doing the fuzzing!

My first fuzzing test

In fuzzing, as with any testing, you need to decide what it is you actually want to test. In general fuzzing will give you more interesting results the tighter a focus you can give it: the fuzzer is trying to learn your code by throwing structured random data in and watching for code path changes, so the fewer code paths you test over the more likely you are to see results in a meaningful time. Fuzzing is something that you’ll be likely running for hours to see if it finds anything, so you want it to use that time efficiently.

Similarly, it depends on where your concerns lie with your product. If you trust the Django team to have done their homework you might want to fuzz your views directly, rather than coming in from a client that invokes the entire Django stack, though if you have some of your code run as middleware you might need to fuzz from the HTTP request handler to ensure those are covered. Alternatively, if you have a particularly gnarly internal library for parsing some weird data formats you might want to cut Django out the loop entirely and call that code directly to ensure that it gets stressed properly. It's your call, but just know that fuzzing isn't a quick process, so the tighter a focus you can have the sooner you'll get meaningful results, so it's worth thinking through where you want to apply this technique and what you hope to get from it.

The way the Atheris library works is you provide a single function that is effectively your test case method. Atheris will call this repeatedly with some seemingly arbitrary data, and you have to pass that test data to your code in some meaningful way. Either your code will "work" or it'll raise an exception, which will cause Atheris to stop and declare it has found an issue. In classic test fashion you can cause that exception by asserting return results from your code are sensibe (e.g., your HTTP request return was never a 5xx status code). Whilst your code runs Atheris will be watching what instructions are executed to try and associate different code paths with different bits in the input data to work out how to exercise more of your code.

So, for a simple example we can do:

import sys

import atheris
import django
from django.test import Client

django.setup()

def test_my_code(data):
  url = '/' + data.decode('latin2')
  client = Client()
  response = client.get(url)
  if response.status_code not in [200, 302, 404]:
    raise RuntimeError(f"Unexpected status code {response.status_code} for {url}")

atheris.Setup(sys.argv, test_my_code)
atheris.Fuzz()

This is a very simple version of the example in Tomasz Nowak's excellent article that lead me down this path - his version is better, but I've cut it down for readability here. Basically we take the randomish data Atheris has provided, turn it into a URL, and see if we get an unexpected response from our server, or if the server throws any unhandled exceptions. Over time Atheris will start to see that different URLs will cause behaviours in the code, and it'll build up a model to let it keep stressing your app better.

Starting to stress my Django app more directly

For my testing I wanted to start by recreating the kind of tests my client was getting thrown at them in their external security testing: calling URLs with some random GET or POST data. I knew from the logs from the server during testing that the application was throwing some 500 errors as a result of the probes, and I wanted to exercise the code base to try and weed more of those out.

My first step was to not waste time having Atheris figure out what URLs were valid or not, I knew that already, as Django makes you define this! So I had some code to walk the URL patterns and flatten it for me:

def flatten_urls(namespace, patterns):
    res = []
    for url in patterns:
        if isinstance(url, URLPattern):
            res.append((namespace, url))
        elif isinstance(url, URLResolver):
            try:
                if url.urlconf_module.app_name not in IGNORE_APPS:
                    res += flatten_urls(url.namespace, url.url_patterns)
            except AttributeError:
                pass
    return res

from urls import urlpatterns
test_urls = flatter_urls(None, urlpatterns)

Two things to note here:

I'm storing both the URL pattern and the namespace it's in because I'll use the Django reverse method to get an actual URL from this later in this process (keep reading!)
I have an array of apps I don't want to test called IGNORE_APPS that I used to exclude some Django apps from testing, such as the built in Django admin interface or say Django Test Framework. These apps have URLs exposed in our test environment that I don't want to have fuzzed: partly this is to save time, but it's also because I found issues in some of these, as they're not meant to be deployed in production, so I felt it was best to just skip those for now. Again, you just need to think about what your focus is with testing like this.

Once I have a list of URLs I want to have Atheris select one on each invocation of our test function.

For this its important to make sure that all inputs to your code under test are derived from the test data that Atheris provides, so that Atheris can work out what changes to the input data cause the different paths in your code to be exercised. You don't want to say just round robin through the URL list or pick at random each time - you have to let Atheris drive this otherwise it can’t correlate between the data it provided and how your code reacted.

Thankfully Atheris comes with a handy class that will let you use the raw data it provides in a more structured way. So at the start of each test I use this to select which URL is being used:

def test_one_url(data):
    fdp = atheris.FuzzedDataProvider(data)
    urlpattern_to_test = fdp.PickValueInList(test_urls)
    method = fdp.PickValueInList(["GET", "POST", "PUT", "HEAD", "DELETE"])
    ...

Thanks to this Atheris now is in control of which URL is selected, and can relate changes in URL called to changes in the code executed. Similarly I've done the same for the HTTP request method I'm going to use.

Now, I've picked a URL pattern to use, and now I need to covert it to an actual URL string I can pass to Django's test client similar to the first code snippet we used. For this I can use Django's built in reverse function like so:

    # Remember above we packed the namespace and the URLpattern into a tuple!
    namespace, urlpattern = urlpattern_to_test
    name = urlpattern.name
    if namespace is not None:
        name = f"{namespace}:{name}"
    url_to_test = reverse(name)

At this point you're going to get some URLs that work, but an awful lot that will throw a NoReverseMatch exception as they expect arguments in the URL. Thankfully, with Django's typed URL definitions we can use the FuzzedDataProvider to fill in the gaps with interesting test data.

For example, imagine we have some URL patterns in Django defined thus:

urlpatters = [
    path('', views.home, name='home'),
    path('users/<str:username>/', views.user_home, name='user_home'),
    path('users/<str:username>/photo/<int:photo_id>/', views.user_photo, name='user_photo'),
]

Here the first URL doesn't need any test data in the URL pattern itself, the second URL needs a string to define the user name, and the final URL needs a user name and the number of a photo they have uploaded. I'm going to trust that the Django parameter conversion functions work for now, and I only want to pass in valid data to stress my own code. To do that, I have my tests generate meaningful test data in the URLs by inspecting the urlpattern object itself to see what arguments it requires:

  pattern = urlpattern.pattern
  if isinstance(pattern, django.urls.resolver.RoutePattern):
      kwargs = {}
      for key in pattern.converters:
          if isinstance(pattern.converters[key], StringConverter):
              # strlen of 0 will be failed by reverse, so always try for more
              str_length = fdp.ConsumeUInt(3) + 1
              kwargs[key] = fdp.ConsumeUnicode(str_length)
          elif isinstance(pattern.converters[key], IntConverter):
              kwargs[key] = fdp.ConsumeInt(3)
          else:
              # A reminder if I start adding more types to my URLs...
              raise ValueError(f"unexpected converter for {key} in {name}")
      try:
          url = reverse(name, kwargs=kwargs)
      except NoReverseMatch:
          # This can happen if string args are '' for instance
          return
      except UnicodeEncodeError:
          # Not all data returned by FuzzDataProvider.ConsumeUnicode is necessarily
          # valid unicode as they try to test unicode parsing, but here it just
          # causes us to fail in reverse, so skip it.
          return

Things to note about this block of code:

What I'm doing here only works for the newer path based URL definitions in Django, not the ones that are based on regexs. For now, because they're in a minority, I'm just not fuzzing the few in my client’s app that are still regex based (I’ll get there in time!). If you wanted to you could try spotting patterns in those and replacing them using positional arguments, but in general I'm in favour of defining your own converter where the standard ones aren't sufficient, and then I could just add to this code here to add something for each type (always, of course, deriving the data from FuzzedDataProvider).
If Django expects a string in a URL pattern (like we do with username) it won’t accept an empty string, so I try to account for that by adding one to the number returned by FuzzedDataProvider for the string length. However, this isn't a guarantee that you'll get a string of length one and above when you call ConsumeUnicode. If the data block provided to the test function is short, then it might not have any data to use in ConsumeUnicode, so you'll still get an empty string returned there - thus I still need to handle NoReverseMatch exceptions from when that happens.
The ConsumeUnicode function doesn't always return valid unicode, and that also causes the Django test client to get upset when parsing the URL, and I don't count that as a failure of the Django code my client has written, so I skip that.

So, we've done a bunch of introspection trickery here, but it ensures that each URL called will be valid for our application, which saves a lot of fuzzing time from calling URLs that will only return 404s, and we've got a way to provide different arguments into our code (in the example username and photo_id) to stress it further.

With this alone I started finding unhandled invalid data paths in my client's application that resulted in 500s that were for the most part quick and easy to fix, and now it returns sensible 4xx results for these.

To go a bit further and test data provided not via the URL but by query and post parameters I took a leaf from another example I found, httpfuzz, and I used Atheris's FuzzedDataProvider to give me chunks of data to provide to GET methods as query parameters or POST methods in data:

    data_length = fdp.ConsumeUInt(3)
    if method == "GET":
        response = client.get(url + "?" + fdp.ConsumeUnicode(data_length))
    elif method == "POST":
        response = client.post(url, fdp.ConsumeBytes(data_length), content_type="application/binary")
    ...

Same proviso as before: you can't guarantee on any given execution how much data you get back from these, but over time Atheris will use it to probe your code in detail as it tries longer data samples.

And that’s how you get started fuzzing!

And that's all there is to it for my first Django fuzzing test. I've put my a full version of the snippets I used in my basic test script in this gist for you to look at, but don't think of this as "what you need to do to fuzz your Django app" - it's an example of a very specific test, and you should think about where in your Django app you'd benefit from stressing it with unexpected data to try shake out any issues, not just assume the problems I was trying to solve are the same as your own gaps. Perhaps there's a specific view you're worried about (e.g., an uploader view) or you have some non-standard log in code. Or perhaps you do just want to take a broader scatter approach over the entire app so over time you weed out bugs like these.

There's other things beyond just the URLs to consider: you could check if URL calls are generating database events when they shouldn't. You could seed the database with some valid data to try expose more of your code paths. You could run as a logged in user rather than as I've done here not being logged in. There's as many ways to fuzz your application as there are as many ways to write tests for your apps.