Ed Crewe: WSGI functional benchmark for a Django Survey Application

I am currently involved in the redevelopment of a survey creation tool, that is used by most of the UK University sector. The application is being redeveloped in Django, creating surveys in Postgresql and writing the completed survey data to Cassandra.
The core performance bottleneck is likely to be the number of concurrent users who can simultaneously complete surveys. As part of the test tool suite we have created a custom Django command that uses a browser robot to complete any survey with dummy data.
I realised when commencing this WSGI performance investigation that this functional testing tool could be adapted to act as a load testing tool.
So rather than just getting general request statistics - I could get much more relevant survey completion load data.

There are a number of more thorough benchmark posts of raw pages using a wider range of WSGI servers - eg. http://nichol.as/benchmark-of-python-web-servers , however they do not focus so much on the most common ones that serve Django applications, or address the configuration details of those servers. So though less thorough, I hope this post is also of use.

The standard configuration to run Django in production is the dual web server set up. In fact Django is pretty much designed to be run that way, with contrib apps such as static files provided to collect images, javascript, etc. for serving separately to the code. Recognizing that in production a web server optimized for serving static files is going to be very different from one optimized for a language runtime environment, even if they are the same web server, eg. Apache. So ideally it would be delivered via two differently configured, separate server Apaches. A fast and light static configured Apache on high I/O hardware, and a mod_wsgi configured Apache on large memory hardware. In practise Nginx may be easier to configure for static serving, or for a larger globally used app, perhaps a CDN.
This is no different from optimising any web application runtime, such as Java Tomcat. Separate static file serving always offers superior performance.

However these survey completion tests, are not testing static serving, simpler load tests suffice for that purpose. They are testing the WSGI runtime performance for a particular Django application.

Conclusions

Well you can draw your own, for what load you require, of a given set hardware resource! You could of course just upgrade your hardware :-)

However clearly uWSGI is best for consistent performance at high loads, but
Apache MPM worker outperforms it when the load is not so high. This is likely to be due to the slightly higher memory per thread that Apache uses compared to uWSGI

Using the default Apache MPM process may be OK, but can make you much more open to DOS attacks, via a nasty performance brick wall. Whilst daemon mode may result in more timeout fails as overloading occurs.

Gunicorn is all Python so easier to set up for multiple django projects on the same hardware, and performs consistently across different loads, if not quite as fast overall.

I also tried a couple of other python web servers, eg. tornado, but the best I could get was over twice as slow as these three servers, they may well have been configured incorrectly, or be less suited to Django, either way I did not pursue them.

Oh and what will we use?

Well probably Apache MPM worker will do the trick for us, with a separate proxy front-end Apache configured for static file serving.
At least that way, its all the same server that we need to support, and one that we are already well experienced in. Also our static file demands are unlikely to be sufficient to warrant use of Nginx or a CDN.

I hope that these tests may help you, if not make a decision, maybe at least decide to try out testing a few WSGI servers and configs, for yourself. Let me know if your results differ widely from mine. Especially if there are some vital performance related configuration options I missed!

Running the functional load test

To run the survey completion tool via number of concurrent users and collect stat.s on this, I wrapped it up in test scripts for locust.

So each user completes one each of seven test surveys.
The locust server can then be handed the number of concurrent users to test with and the test run fired off for 5 minutes, over which time around 3-4000 surveys are completed.

The number of concurrent users tested with was 10, 50 and 100
With our current traffic peak loads will probably be around the 20 users mark with averages of 5 to 10 users. However there are occasional peaks higher than that. Ideally with the new system we will start to see higher traffic, where the 100 benchmark may be of more relevance.

Fails

A number of bad configs for the servers produced a lot of fails, but with a good config these seem to be very low. So all 3 x 5 minute test runs for each setup created around 10,000 surveys, these are the actual number of fails in 10,000
so insignificant perhaps ...

Apache MPM process = 1
Apache MPM worker = 0
Apache Daemon = 4
uWSGI = 0
Gunicorn = 1

(so the fastest two configs both had no fails, because neither ever timed out)

Configurations

The test servers were run on the same virtual machine, the spec of which was
a 4 x Intel 2.4 GHz CPU machine with 4Gb RAM
So optimum workers / processes = 2 * CPUs + 1= 9

The following configurations were arrived at by tinkering with the settings for each server until optimal speed was achieved for 10 concurrent users.
Clearly this empirical approach may result in very different settings for your hardware, but at least it gives some idea of the appropriate settings - for a certain CPU / memory spec. server.

For Apache I found things such as WSGIApplicationGroup being set or not was important, hence its inclusion, with a 20% improvement when on for MPM prefork or daemon mode, or off for MPM worker mode.

Apache mod_wsgi prefork

WSGIScriptAlias / /virtualenv/bin/django.wsgi
WSGIApplicationGroup %{GLOBAL}

Apache mod_wsgi worker

WSGIScriptAlias / /virtualenv/bin/django.wsgi

<IfModule mpm_worker_module>
# ThreadLimit    1000
    StartServers         10
    ServerLimit          16
    MaxClients          400
    MinSpareThreads      25
    MaxSpareThreads     375
    ThreadsPerChild      25
    MaxRequestsPerChild   0
</IfModule>

Apache mod_wsgi daemon

WSGIScriptAlias / /virtualenv/bin/django.wsgi
WSGIApplicationGroup %{GLOBAL}

WSGIDaemonProcess testwsgi \
    python-path=/virtualenv/lib/python2.7/site-packages \
    user=testwsgi group=testwsgi \
    processes=9 threads=25 umask=0002 \
    home=/usr/local/projects/testwsgi/WWW \
    maximum-requests=0

WSGIProcessGroup testwsgi

uWSGI

uwsgi --http :8000 --wsgi-file wsgi.py --chdir /virtualenv/bin \
--workers=9 --buffer-size=16384 --disable-logging

Gunicorn

django-admin.py run_gunicorn -b :8000 --workers=9 --keep-alive=5

Ed Crewe

Pages

Monday, 6 January 2014

WSGI functional benchmark for a Django Survey Application