Wednesday, October 29, 2014

Fixing third party Django packages for Python 3

With the release of Django 1.7 it could be argued that the balance has finally tipped towards Python 3 being its preferred platform. Well given Python 2.7 is the last 2.* then its probably time we all thought about moving to Python 3 for our Django deployments.

Problem is those pesky third party package developers, because unless you are determined wheel reinventor (unlikely if you use Django!) - you are bound to have a range of third party eggs in your Django sites. As one of those pesky third party developers myself, it is about time I added Python 3 compatibility to my Django open source packages.

There are a number of resources related to porting Python from 2 to 3, including specifically for Django, but hopefully this post may still prove useful as a summarised approach for doing it for your Django projects or third party packages. Hopefully it isn't too much work and if you have been writing Python as long as me, it may also get you out of any legacy syntax  habits you have.

So lets get started, first thing is to set up Django 1.7 with Python 3
For repeatable builds we want pip and virtualenv - if they are not there.
For a linux platform such as Ubuntu you will have python3 installed as standard (although not yet the default python) so if you just add pip3 that lets you add the rest ...

Install Python 3 and Django for testing


sudo apt-get install python3-pip
(OR sudo easy_install3 pip)
sudo pip3 install virtualenv



So now you can run virtualenv with python3 in addition to the default python (2.*)

virtualenv --python=python3 myenv3
cd myenv3
bin/pip install django


Then add a src directory for putting the egg in you want to make compatible with Python 3 so an example from git (of course you can do this as one pip line if the source is in git)


mkdir src
git clone https://github.com/django-pesky src/django-pesky
bin/pip install -e src/django-pesky


Then run the django-pesky tests (assuming nobody uses an egg without any tests!)
so the command to run pesky's test may be something like the following ...

bin/django-admin.py test pesky.tests --settings=pesky.settings
One rather disconcerting thing that you will notice with tests is that the default assertEqual message is truncated in Python 3 where it wasn't in Python 2 with a count of the missing characters in square brackets, eg.

AssertionError: Lists differ: ['Failed to open file /home/jango/myenv/sr[85 chars]tem'] != []


Common Python 2 to Python 3 errors


And wait for those errors. The most common ones are:

  1. print statement without brackets
  2. except Error as err (NOT except Error, err)
  3. File open and file methods differ.
    Text files require better quality encoding - so more files default to bytes because strings in Python 3 are all stored in unicode
    (On the down side this may need more work for initial encoding clean up *,
    but on the plus side functional errors due to bad encoding are less likely to occur)
  4. There is no unicode() method in Python 3 since all strings are now unicode - ie. its become str() and hence strings no longer need the u'string' marker 
  5. Since unicode is not available as a method, it is not used for Django models default representation. Hence just using
    def __str__(self):
            return self.name
    is the future proofed method. I actually found that models with __unicode__ and __str__ methods may not return any representation, rather than the __str__ one being used, as one might assume, in Django 1.7 and Python 3
  6. dictionary has_key has gone, must use in (if key in dict)

* I found more raw strings were treated as bytes by Python 3 and these then required raw_string.decode(charset) to avoid them going into the database string (eg. varchar) fields as pseudo-bytes, ie. strings that held 'élément' as '\xc3\xa9l\xc3\xa9ment' rather than bytes, ie. b'\xc3\xa9l\xc3\xa9ment'

Ideally you will want to maintain one version but keep it compatible with Python 2 and 3,
since this is both less work and gets you into the habit of writing transitional Python :-)

Test the same code against Python 2 and 3


So to do that you want to be running your tests with builds in both Pythons.
So repeat the above but with virtualenv --python=python2 myenv2
and just symlink the src/django-pesky to the Python 2 src folder.

Now you can run the tests for both versions against the same egg code -
and make sure when you fix for 3 you don't break for 2.

For current Django 1.7 you would just need to support the latest Python 2.7
and so the above changes are all compatible except for use of unicode() and how you call open().

Version specific code


However in some cases you may need to write code that is specific to 2 or 3.
If that occurs you can either use the approach of latest or anything else (cross fingers)

try:
    latest version compatible code (e.g. Python 3 - Django 1.7)
except:
    older version compatible code (e.g. Python 2 - Django < 1.7)

Or you can use specific version targetting ...

import sys, django
django_version = django.get_version().split('.')

if sys.version_info['major'] == 3 and django_version[1] == 7:
    latest version
elif sys.version_info['major'] == 2 and django_version[1] == 6:
    older django version
else:
    older version


where ...

django.get_version() -> '1.6' or '1.7.1'
sys.version_info() -> {'major':3, 'minor':4, 'micro':0, 'releaselevel':'final', 'serial':0}

Summary

So how did I get on with my first egg, django-csvimport ? ... it actually proved quite time consuming since the csv.reader library was far more sensitive to bad character encoding in Python 3 and so a more thorough manual alternative had to be implemented for those important edge cases - which the tests are aimed to cover. After all if a CSV file is really well encoded and you already have a model for it - it hardly needs a pesky third party egg for CSV imports - just a few django shell lines using the csv library will do the job.







Thursday, July 3, 2014

Spring MVC setup on Ubuntu

Recently setting up Spring MVC on Ubuntu 14 with Netbeans wasn't entirely obvious for a newbie, so I thought I would document it in case it saved somebody 10 minutes!


First install Apache and Tomcat, if you haven't got them already...

sudo apt-get install apache2


sudo apt-get install tomcat7 tomcat7-docs tomcat7-admin tomcat7-examples

You should also have the default openjdk for tomcat and ant build tool and git

sudo apt-get install default-jdk ant git

Edit the tomcat-users.xml netbeans requires a user with the manager-script role
(NOTE: you shouldn't give the same user all these roles in a production Tomcat!
Also note that these manager roles have changed from Tomcat 6)

sudo emacs /etc/tomcat7/tomcat-user.xml


<tomcat-users>
  <role rolename="manager-gui"/>
  <role rolename="manager-script"/>
  <role rolename="manager-jmx"/>
  <role rolename="manager-status"/>
  <role rolename="admin-gui"/>
  <role rolename="admin-script"/>
  <user username="admin" password="admin" roles="manager-gui,manager-script,manager-jmx,manager-status,admin-gui,admin-script"/>
</tomcat-users>


Should restart tomcat after editing this ...

sudo service tomcat7 restart

Now you should be able to go to http://localhost:8080 and see

It works !

If you're seeing this page via a web browser, it means you've setup Tomcat successfully. Congratulations! ...

Click on the link to the manager and get the management screen

If the login fails - reinstall apache and tomcat - it worked for me!

For Netbeans to find Apache OK you have to put the config directory where it expects it ...

sudo ln -s /etc/tomcat7/ /usr/share/tomcat7/conf

Note that the tomcat location, ie for the deploy directory is in

/usr/var/lib/tomcat7

Now install Netbeans, latest version is 8, either by download and install or

sudo apt-get install netbeans

Start up netbeans and go to  Tools > Plugins

Pick the Available plugins tab

Search for web and tick Spring MVC - plus any others you fancy!

Restart Netbeans

Add a new project

  1. Choose New Project (Ctrl-Shift-N; ⌘-Shift-N on Mac) from the IDE's File menu. Select the Java Web category, then under Projects select Web Application. Click Next.
  2. In Project Name, type in HelloSpring. Click Next.
  3. Click the Add... button next to the server drop down
  4. Select the Apache Tomcat or TomEE server in the Server  list, click Next
    Enter  Server Location: /usr/share/tomcat7
    Enter the username and password from your tomcat-users.xml above and untick the create user box, if everything is working then it will accept this and add Tomcat to your server drop down list 
    (it shouldn't need to try to add the user unless that user isn't already properly set up with the manager-script role in Tomcat)
  5. In Step 4, the Frameworks panel, select Spring Web MVC.
  6. Select Spring Framework 3.x in the Spring Library drop-down list. 
    Spring Web MVC displayed in the Frameworks panel

 Click Finish and you should have a skeleton Spring MVC project, pressing the Play button should build it and run it up, then launch your chosen browser with the home page of that project via the Apache Tomcat you have setup.
Any changes should get auto-deployed and popped up in the browser again by pressing play.


Friday, May 2, 2014

Lessons learned from setting up a website on Amazon EC2

I recently got involved with helping someone sort out their website on an Amazon EC2 instance, it had been a few years since I had the need to do anything with EC2, I realised that I was a novice in this world - and it raised a number of issues related to deploying to EC2 and performance.

So I thought it may be useful to run through them for any other EC2 novices who are asked to do something similar, and want to learn from my rather blundering progress through this :-)

Apologies for those of you are already well familiar with EC2 for covering some of the basics.

The system moodpin.co.uk was based on a commercial PHP application, Pintastic.
So this allows you to set up a site like pinterest.com or wanelo.com
These sort of sites are for creating subject specific photo sharing social media systems, so like Instagram, Picassa etc. but focussed around communities of shared (usually commerical) interest. For example buying shoes, interior decor etc.
The common UI that they tend to present are big scrolling pages of submitted images related to topics for sharing, comment and discussion.

So this system sends out a lot of notification emails, involves displaying hundreds of images per page - the visual pin board - and to help with performance has custom caching built in - triggered by cron jobs.

Hence we have a number of cron jobs with the caching ones running every couple of minutes. To me this appeared a pretty crude caching mechanism - but my job was not to rewrite the application, but just tweak the code and get it all running OK.
The code mainly uses a standard MVC approach like everything else these days!

So demonstrating how outdated my knowledge of EC2 or this application were. I thought OK - first of all what platform is it. It was Amazon's own Linux - this uses yum rather than apt for package installs so as distros go its perhaps more Redhat-like than Debian.

For those unfamiliar with the basics - go to Amazon web services and sign up!
You can then choose to add some of the 40 odd different services that are available under the EC2 umbrella.

Once you have signed up to a few of these, you get a management console that links to a control dashboard for each service. The first step usually being - the one with the computer instances on, EC2. From there you can pick an AMI (ie. operating system image), a zone - eg. US West (Oregon) and use it to create a new instance. Add an SSH key pair for shell access and then fire it up and download the pem file so you can ssh into your new Amazon box.

So the client wanted the usual little tweaks to PHP code,  CSS tweaking - so easy stuff its just web development ... done in a jiffy (well after digging through the MVC layers, templating language, cache issues and CSS inheritance etc. for a fairly complex PHP app you have never come across before, when PHP is not exactly your favourite language ... jiffyish maybe)
Then we got to the more SysAdmin related requests ... lets just say I probably shouldn't rush out and buy a DevOps tee-shirt just yet ...

'Get email working'

  1. Try to send an email from the web application - write a plain PHP script that just sends a test email - just run mail from the linux command line ... Got it there is no MTA installed! 
  2. Install an MTA - sendmail. Go back up that stack of actions and they are all working ... hurray that was easy.
  3. A week or so later ... 'emails stopped working'
  4. Go back to step 1. and yep - emails stopped working
  5. Look at the mail logs and see what the problem is.
  6. Realise that there are masses of emails being sent out ... but all of it is bouncing back as unverified.
  7. Think ... wow that pintastic site's notifier is busy - must be getting lots of traffic *
  8. So why has Amazon started bouncing all the email?
  9. Search Amazon's docs. Amazon has a very minimal test quota allowed for email. Once that quota is filled, unverified email will be blocked.
  10. Amazon has historically been one of the main sources of SPAM machines, that history means that it has to set up a much more elaborate mechanism for validating email that most hosting companies, and it no longer allows direct emailing from EC2 boxes (apart from minimal test quotas)
  11. So what we need to do is set up our mail to be sent via the Amazon SES service - add SES service and enable it
  12. So now we need to send authorised emails to the Amazon SES gateway that will then forward them on to the outside world
  13. Try to get sendmail to send authenticated emails, follow guide but it continues to bounce with authentication failure, give up and install postfix, follow the 20 steps of setting up the SASL password etc., and eventually it doesn't bounce with authentication errors - hurray!
  14. But the email still bounces. So we need to verify all our sending email addresses - managed by the SES console - or use DKIM to get the whole domain verified and signed from which we are sending.
  15. Modify the emails used by the sending software to ones which we can receive and validate - send and validate them. Our emails are working again.
  16. Leave it a few days, we are not sending email anymore, boooo!
  17. Check all the SES documentation, surprise, surprise SES also has quota limits for test level only, and you have to formally apply to get those limits lifted.
  18. Contact the client and get him to make a formal request for quota lifting on his account.
  19. *As part of the investigation check that email log a little more closely, it seems rather large, and we seem to be using up our quotas really quickly ... ah the default setup for unix cron sends an email for every job that returns text. The pintastic cache job returns text, so we are sending a pointless email every two minutes ... or trying to ... whoops. Make sure no cron or other unix system command is acting as a SPAM bot.
  20. A few days later - Amazon say our quota has been lifted
  21. Our emails have started sending again ... and they are still sending today !!!
Clients response, OK thanks, by the way since we added all the start up data / ie. uploaded images, the site takes at least two minutes to render the home page - or times out altogether.
Hmmm I did kinda notice that ... but hey he hadn't asked me to make the site actually usable speed wise ... until now!

'Why is the site, really, really slow?'


Hmmm wow it really is slow, lots of the time it just dies, that PHP cache thingy can't be doing much, so whats the problem.

  1. Lets look at the web site, wow it takes 5 minutes for the page to come back ... so this isnt exactly Apache bench territory ... run up a few tabs looking at the home page ... and it starts just returning server timeouts.
  2. So whats happening on the server ... whats killing the box ... top tells us that its Apache killing us here - with 50 odd processes spawning and sucking up all memory and CPU.
  3. So we check out our Apache config and its the usual PHP orientated config of MPM prefork. But what are the values set ... they are for a great big multiprocessor cadillac of a machine, whilst ours is more of a smart car in its scale. 
  4. Lesson is that Amazon AMI's are certainly not smart enough to have different image configs for different hardware specs of instances they provide. So it appears they default their configs to suiting the top of the range instances (since I guess they cost the most). If you have a minimal hardware spec box ... you should reconfigure hardware related parameters for the software you run on it ... or potentially it will fail.
  5. Slash all those servers, clients etc. values to the number of servers and processes the server can actually deliver. Slightly trial and error here ... but eventually we got MaxClients 30 instead of 500 etc. and give it a huge timeout.

    <IfModule prefork.c>
    StartServers       4
    MinSpareServers    2
    MaxSpareServers  10
    ServerLimit      30
    MaxClients       30
    MaxRequestsPerChild  4000
    </IfModule>
  6. Now lets hammer our site again ... hurray it doesn't completely fall over ... one day it may return a page, but its horribly horribly slow still ie. 3 minutes absolute top speed - further home page requests the slower they get.
  7. So lets get some stat.s, access the page with browser web dev network tools. Whats taking the time here. Hmmm web page a second, not great but acceptable, JS and CSS 0.25 sec, OK. Images hmmm images hmmm for the home page particularly ... 3-6 minutes ... so basically unusable.
  8. So time to bite the bullet we know Apache can be slower at serving static pages if its not optimised for it - especially if resources are limited (its processes have a bigger memory overhead), thats why the Apache foundation has another web server, Apache Trafficserver , for that job
  9. But whats the standard static server (thats grabbed half of Apache's share of the web in the last few years), yep nginx
  10. So lets set up the front end of our site as nginx acting as a reverse proxy to Apache just doing the PHP work, with nginx serving all images. So modify Apache to just serve on 8080 on localhost and flip the site over to an nginx front end, with the following nginx conf ...

    server {        listen       80;
            server_name  moodpin.co.uk;
                                                                                                                                                                                                       
            location ^~ /(cache|cms|uploads) {
                     root   /var/www/html/;
                     expires 7d;
                    access_log  /var/log/nginx/d-a.direct.log ;
            }
                                                                                                                                                                                                      
            location ~* \.(css|rdf|xml|ico|txt|gif|jpg|png|jpeg)$ {
                     expires 365d;
                     root  /var/www/html/;
                    access_log  /var/log/nginx/d-a.direct.log ;
            }

          location / {
                proxy_pass         http://127.0.0.1:8080/;
     
    Wow, wow, so take that 3-6 minutes and replace it with 1-2 seconds.
  11. So how many images on the home page - about 150 plus more with scrolling ... so that means we have a site that is on average under 0.5% dynamic code driven content and 99.5% static content/requests per page.
    That is a very very static site - hence the 100 x faster speed!
  12. So there you go client take that souped up smart car and go 
  13. Client replies ... ummm sites down - server proxy timeout error
  14. Go to Google and check, so we have to make sure that nginx has timeout settings greater than Apache's - and nginx default timeout is 60 seconds
  15. Make nginx _timeout settings into 10 minutes ... sounds bad, try the site, and it consistently delivers pages in 3 seconds or so assume that the scrolling request update page nature of the app, makes the timeout required much longer than the apparent time Apache is delivering PHP within?
  16. Show the client again, hes happy.
  17. Few days later ... this bit of the sites not working now
  18. Check the code, discover that there is a handful of javascript files used by the system that are not really static - they are PHP templates generating javascript that appear static. Remove js file types from the list of files above in the nginx config. Hurray generated javascript served from Apache PHP now. Bit of site works again
  19. OK we are done ... don't run Apache bench against the site ... if the client actually gets any users and it cant' cope - tell him to upgrade his instance.

    I hope my tails of devops debuggery are useful to you, Bye!
       

    Monday, January 13, 2014

    Postgres character set conversion woes

    I had to struggle with sorting out some badly encoded data in Postgresql over the last day or so.
    This proved considerably more hassle than I expected, partly due to my ignorance of the correct syntax to use to convert textual data.

    So on that basis I thought I would share my pain!

    There are a number of issues with character sets in relational databases.

    For a Postgres database the common answers often relate to fixing the encoding of the whole database. So if this is the problem the fixes are often just a matter of setting your client encoding to match that of the database. Or to dump the database then create a new one with the correct encoding set, and reload the dump.

    However there are cases where the encoding is only problematic for certain fields in the database, or where you are creating views via database links between two live databases of different encodings - and so need to fix the encoding on the fly via these views.

    Ideally you have two databases that are both correctly encoded, but just use different encodings.
    If this is the case you can just use convert(data, 'encoding1', 'encoding2') for the relevant fields in the view.

    Then you come to the sort of case I was dealing with. Where the encoding is too mashed for this to work. So where strings have been pushed in as raw byte formats that either don't relate to any proper encoding, or use different encodings in the same field.

    In these cases any attempt to run a convert encoding function will fail, because there is no consistent 'encoding1'

    The symptoms of such data is that it fails to display. So is sometimes its difficult to notice until
    the system / programming language that is accessing the data throws encoding errors.
    In my case the pgAdmin client failed to display the whole field so although the field appears blank, matches with like '%ok characs%' or length(field) still work OK. Whilst the normal psql command displayed all the characters except for the problem ones, which were just missing from the string.

    This problem has two solutions:

    1. Repeat the dump and rebuild approach with the correct encoding, but to write a custom script in Perl, Python or the like to fix the mashed encoding - assuming that the mashing is not so entirely random as to be fixable via an automated script*. If it isn't - then you either have to detect and chuck away bad data - or manually fix things!

    2. Fix the problem fields via pl/sql, pl/python or pl/perl functions that process these to replace known problem characters in the data.

    I chose to use pl/sql since I had a limited set of these problem characters, so didn't need the full functionality of Python or Perl. However in order for pl/sql to be able to handle the characters for fixing, I did need to turn the problem fields into raw byte format.

    I found that the conversion back and forth to bytea was not well documented, although the built in functions to do so were relatively straight forward...

    Text to Byte conversion => text_field::bytea

    Byte to Text conversion => encode(text_field::bytea, 'escape')

    So employing these for fixing the freaky characters that were used in place of escaping quotes in my source data ...

    CREATE OR REPLACE FUNCTION encode_utf8(text)
      RETURNS text AS
    $BODY$
    DECLARE
        encoding TEXT;
    BEGIN
        -- single quote as superscript a underline and Yen characters              
                                                
        IF position('\xaa'::bytea in $1::TEXT::BYTEA) > 0 THEN
            RETURN encode(overlay($1::TEXT::BYTEA placing E'\x27'::bytea from position('\xaa'::bytea in $1::TEXT::BYTEA) for 1), 'escape');
        END IF;

        -- double quote as capital angstroms character                                                                                                                              
        IF position('\xa5'::bytea in $1::TEXT::BYTEA) > 0 THEN
            RETURN encode(overlay($1::TEXT::BYTEA placing E'\x22'::bytea from position('\xa5'::bytea in $1::TEXT::BYTEA) for 1), 'escape');
        END IF;
        RETURN $1;
    END;
    $BODY$

    Unfortunately the Postgres byte string functions don't include an equivalent to a string replace and the above function assumes just one  problem character per field (my use case), but it could be adapted to loop through each character and fix it via use of overlay.
    So the function above allows for dynamic data fixing of improperly encoded text in views from a legacy database that is still in use - via a database link to a current UTF8 database.

    * For example in Python you could employ chardet to autodetect possible encoding and apply conversions per field (or even per character)

    Monday, January 6, 2014

    WSGI functional benchmark for a Django Survey Application

    I am currently involved in the redevelopment of a survey creation tool, that is used by most of the UK University sector. The application is being redeveloped in Django, creating surveys in Postgresql and writing the completed survey data to Cassandra.
    The core performance bottleneck is likely to be the number of concurrent users who can simultaneously complete surveys. As part of the test tool suite we have created a custom Django command that uses a browser robot to complete any survey with dummy data.
    I realised when commencing this WSGI performance investigation that this functional testing tool could be adapted to act as a load testing tool.
    So rather than just getting general request statistics - I could get much more relevant survey completion load data.

    There are a number of more thorough benchmark posts of raw pages using a wider range of WSGI servers - eg. http://nichol.as/benchmark-of-python-web-servers , however they do not focus so much on the most common ones that  serve Django applications, or address the configuration details of those servers. So though less thorough, I hope this post is also of use.

    The standard configuration to run Django in production is the dual web server set up. In fact Django is pretty much designed to be run that way, with contrib apps such as static files provided to collect images, javascript, etc. for serving separately to the code. Recognizing that in production a web server optimized for serving static files is going to be very different from one optimized for a language runtime environment, even if they are the same web server, eg. Apache. So ideally it would be delivered via two differently configured, separate server Apaches. A fast and light static configured Apache on high I/O hardware, and a mod_wsgi configured Apache on large memory hardware. In practise Nginx may be easier to configure for static serving, or for a larger globally used app, perhaps a CDN.
    This is no different from optimising any web application runtime, such as Java Tomcat. Separate static file serving always offers superior performance.

    However these survey completion tests, are not testing static serving, simpler load tests suffice for that purpose. They are testing the WSGI runtime performance for a particular Django application.

    Conclusions

    Well you can draw your own, for what load you require, of a given set hardware resource! You could of course just upgrade your hardware :-)

    However clearly uWSGI is best for consistent performance at high loads, but
    Apache MPM worker outperforms it when the load is not so high. This is likely to be due to the slightly higher memory per thread that Apache uses compared to uWSGI

    Using the default Apache MPM process may be OK, but can make you much more open to DOS attacks, via a nasty performance brick wall. Whilst daemon mode may result in more timeout fails as overloading occurs.

    Gunicorn is all Python so easier to set up for multiple django projects on the same hardware, and performs consistently across different loads, if not quite as fast overall.

    I also tried a couple of other python web servers, eg. tornado, but the best I could get was over twice as slow as these three servers, they may well have been configured  incorrectly, or be less suited to Django, either way I did not pursue them.

    Oh and what will we use?

    Well probably Apache MPM worker will do the trick for us, with a separate proxy front-end Apache configured for static file serving.
    At least that way, its all the same server that we need to support, and one that we are already well experienced in. Also our static file demands are unlikely to be sufficient to warrant use of Nginx or a CDN.

    I hope that these tests may help you, if not make a decision, maybe at least decide to try out testing a few WSGI servers and configs, for yourself. Let me know if your results differ widely from mine. Especially if there are some vital performance related configuration options I missed!

    Running the functional load test

    To run the survey completion tool via number of concurrent users and collect stat.s on this, I wrapped it up in test scripts for locust.

    So each user completes one each of seven test surveys.
    The locust server can then be handed the number of concurrent users to test with and the test run fired off for 5 minutes, over which time around 3-4000 surveys are completed.

    The number of concurrent users tested with was 10, 50 and 100
    With our current traffic peak loads will probably be around the 20 users mark with averages of 5 to 10 users. However there are occasional peaks higher than that. Ideally with the new system we will start to see higher traffic, where the 100 benchmark may be of more relevance.

    Fails

    A number of bad configs for the servers produced a lot of fails, but with a good config these seem to be very low. So all 3 x 5 minute test runs for each setup created around 10,000 surveys, these are the actual number of fails in 10,000
    so insignificant perhaps ...

    Apache MPM process = 1
    Apache MPM worker = 0
    Apache Daemon = 4
    uWSGI = 0
    Gunicorn = 1

    (so the fastest two configs both had no fails, because neither ever timed out)

    Configurations

    The test servers were run on the same virtual machine, the spec of which was
    a 4 x Intel 2.4 GHz CPU machine with  4Gb RAM
    So optimum workers / processes = 2 * CPUs + 1= 9

    The following configurations were arrived at by tinkering with the settings for each server until optimal speed was achieved for 10 concurrent users.
    Clearly this empirical approach may result in very different settings for your hardware, but at least it gives some idea of the appropriate settings - for a certain CPU / memory spec. server.

    For Apache I found things such as WSGIApplicationGroup being set or not was important, hence its inclusion, with a 20% improvement when on for MPM prefork or daemon mode, or off for MPM worker mode.

    Apache mod_wsgi prefork

    WSGIScriptAlias / /virtualenv/bin/django.wsgi
    WSGIApplicationGroup %{GLOBAL}

    Apache mod_wsgi worker

    WSGIScriptAlias / /virtualenv/bin/django.wsgi

    <IfModule mpm_worker_module>
    #  ThreadLimit    1000
        StartServers         10
        ServerLimit          16
        MaxClients          400
        MinSpareThreads      25
        MaxSpareThreads     375
        ThreadsPerChild      25
        MaxRequestsPerChild   0
    </IfModule>

    Apache mod_wsgi daemon

    WSGIScriptAlias / /virtualenv/bin/django.wsgi
    WSGIApplicationGroup %{GLOBAL}

    WSGIDaemonProcess testwsgi \
        python-path=/virtualenv/lib/python2.7/site-packages \
        user=testwsgi group=testwsgi \
        processes=9 threads=25 umask=0002 \
        home=/usr/local/projects/testwsgi/WWW \
        maximum-requests=0

    WSGIProcessGroup testwsgi

    uWSGI

    uwsgi --http :8000  --wsgi-file wsgi.py --chdir /virtualenv/bin \
                                   --workers=9 --buffer-size=16384 --disable-logging


    Gunicorn

    django-admin.py run_gunicorn -b :8000 --workers=9 --keep-alive=5


    Thursday, November 21, 2013

    Django Cardiff User Group

    Last night I went to the second meeting of the Django Cardiff User Group.

    This is a sister group to the DBBUG Bristol based one that I have been attending for the last 5 years. It was organised by Daniele Procida, who started attending DBBUG events a few years ago and has now decided to spread the word over the Severn, in Wales.

    He is also organising the first UK Django conference in a couple of months, https://djangoweekend.org/ so its good to see one open source / Python group be inspiration for spawning another, and one that is perhaps more organisationally active than its progenitor.

    The evening was fun, and it was good to meet and chat with Djangonauts over the border.

    Andrew Godwin, Django core developer / release manager, gave us an update on all the new goodies to be added in Django 1.7
    So this release is largely about really sorting out the niggling issues with relational database features, and the low level ORM handling of them.
    It sees rationalisation of transaction handling with the use of nestable atomic statements, addition of generic connection pooling, and handling of composite keys.

    Daniele demonstrated how to fly a helicopter (a toy one) via the Python command line, although Andrew seemed rather more adept at landing it safely. I gave a little reprise of a talk introducing DBBUG and how a developer can follow the road to their own open source contributions.

    Thanks to everyone involved, I hope to get to the Django weekend too.

    The ten commandments of software procurement

    For a medium to large scale organisation with its own IT department, I have found in today's market the following truths for software procurement apply. Yet they are usually poorly understood by staff in organisations outside the software sector. Who often view the world through antique pre1990 glasses, before the significant impact of  web based providers, and the mixed economy of revenue models of  modern software companies ...
    1. Software is like any other creative output, it differs radically in quality, modernity and appropriateness - and this is entirely unrelated to its cost. Partly because the majority of today's leading software development companies are internet companies who do not use software charging for revenue. 
    2. So whether or not software is charged for directly via a licensing model is unrelated to whether it is mostly open source or closed source / commercial. Some software is no longer purchasable or the paid for solutions are too poor quality to be viable, compared to the free ones. In such cases other non-financial trading decisions must be part of the procurement arsenal. So policies on data release, etc.
    3. Whether something is open or closed source is entirely irrelevant to its quality, scalability or any other attribute you care to name. These days any software stack is likely to be a mix of both.
      However given source, tests, community and commit rate can all be checked for the former, it is far easier not to pick a lemon, with open source (not that a non-technical organisation tends to use any of these core indicators for procurement assessment).
    4. Software is basically like literature - there are your Barbara Cartland's and your Shakespeare's - unfortunately less people are able to read it to work out what quality it is, so its a book which is generally just judged by its cover - hence the common misconception that software is all roughly the same - or that its quality relates to its cost.
    5. However, the more generic a software application is, the more likely it is that you get better quality for a lower cost - standard economy of scale.
      Hence Google GMail / Microsoft Office / open source Apache - are good quality - because they are large scale generic applications.  
      The more specific an application is, the more likely the software (whether open source or commercial) will have been put together by a core group of at most 3 or 4 developers, hence have less quality control methods applied, be more buggy and risk being generally of a lower standard.
    6. If the IT Services department of your organisation is not sufficiently powerful enough to tell the users what they are going to get, despite what they want. It is common that many systems it deploys will require significant customisation, the more specific they are, the more the customisation.
      Customisation of out sourced, closed source products is likely to incur significantly greater time and development cost than open source ones. Whether customised in house or out sourced. If customised in house then unless the software has a well designed API, docs etc. - ie is a widely used generic system from a major company. You usually find that you can only do black box integration and wrapper coding or resort to breaking license agreements by decompiling. All of which is difficult to maintain.
      If out sourced, then the code may be open, test suited and documented within the supplying company, but you are likely to be paying around 3 times the wage to the company, than your inhouse cost,  for a junior developers customisation / bug fixing time.
    7. Due to historical reasons some types of software have far superior products that are all in one of these camps than the other ... So open source finance software is poor. Closed source web CMS software and repository software is poor, etc.
    8. Non-technical companies will go through a 5-10 year cycle of outsourcing as much software as possible, then auditing consultancy costs, then ballooning internal development to cut costs, then deciding too much development is in house back to outsourcing again. This cycle wastes a lot of money due to its lack of understanding of the benefits of a stable highly selective mixed economy for software of outsourced, open source, commercial and in-house as being the ideal balance of functionality vs. cost.
    9. Buying mix and match products from integrated product suites is a recipe for high cost, eg. MS Exchange Email and Google Docs, rather than all from one or other supplier.
    10. Lastly and most importantly a non-technical organisation always makes its software procurement decisions based on political reasons*. Never on technical ones. This invariably means that it makes decisions that are significantly more costly, difficult to maintain and less well featured than it could achieve using a purely technical assessment process.
      Usually they will also fail to have processes to properly trial run alternative products in a realistic manner, or to audit selections once the initial purchase is made. This may partly be because although auditing may save significant costs in the long run, it does introduce a means by which a wrong choice can be flagged up. Unfortunately it is often less embarrassing to make do with a bad choice, until its end of life, than admit a failure. Even though failing and acceptance of it as part of the process, is essential to delivery of quality (rather than make do) systems. 

    Thank you ... rant over :-)

    * political reasons - The salesman managed to persuade someone suitably senior that they were technically clueless enough to believe them. This usually goes in tandem with, company software team response ... the salesman promised them it did what?? ... make damn sure that isn't in the contract / licensing agreement.




    Site code, Google Apps integration and design - Ed Crewe 2011