Ed Crewe Home

Thursday, 15 November 2012

Cookie law, Cookieless and django tips.

django-cookieless

Last week I released a new add on for django, django-cookieless, it was a relatively small feature that was required for a current project, and since it was such a generic package seemed ideal for open sourcing as a separate egg. It made me realise that I hadn't released a new open source package for well over a year, and so this one is certainly long over due in that sense.

Cookie Law

It also over due in another sense, EU Cookie law has been in force since May 2011, so legally any sites that are used in Europe, and set cookies which are not strictly necessary for the functioning of the site, must now request permission of the user before doing so. Of course it remains to be seen if any practical enforcement measures will happen, although they were due to this summer in the UK, for example. Hence many of the first rush of JavaScript pop up style solutions, have come and gone, as a result of user confusion. But for public sector clients particularly, it is certainly simpler to just not use cookies, if they are not technically required. Also it may at least, make developers rather less blasé about setting cookies.

Certainly most people would prefer not to see their browsers filled with deliberate user tracking and privacy invasive cookies that are entirely unrelated to the sites functionality. In the same way most of us don't like being tracked by CCTV everywhere we go ... unfortunately, the current Law doesn't have a good technical solution behind it, hence it may well founder over time. This is because cookie control is too esoteric for ordinary users, and even with easy browser based privacy levels configuration, any technical solutions are problematic, because a single cookie can be used to both protect privacy (in terms of security - e.g. a CSRF token) and invade it.  It is entirely down to the specific applications usage of it, where these distinctions lie. Invasive methods can also be implemented via other session maintenance tools, such as URL rewriting, yet because no data is written to the users browser, it is outside the remit of this Law, so the Law makes little sense currently, and may well be unenforceable.

Perhaps it would of been better to aim to set laws related to encouraging adherence to set standards of user tracking, starting with compliance with the browser added 'Do Not Track' header, perhaps adding some more subtle gradations over time. With the targets of the Law, being companies whose core business is user tracking for advertising sales etc., starting with Google and working down. Rather than pushing the least transgressive public service sector, as the most likely to comply, to add a bunch of annoying 'Will you accept our cookies?' pop ups.

However even if  this law dries up and blows away, for our particular purposes, we needed django to cater for any number of sessions per browser (as well as not using cookies for anonymous users).
Django's default session machinery requires cookies, so ties a browser to a single session - request.session set against a cookie. But because django-cookieless provides sessions maintainable by form posts, it automatically delivers multiple sessions per browser.

There are a number of security implications with not using cookies, which revolve around the difficulty of preventing session stealing without them. Given this is the case, django-cookieless has a range of settings to reduce that risk, but even so I wouldn't recommend using it for sessions that are tied to authenticated users, and hence could lead to privilege escalation, if the session were stolen.

Django Tips

I thought the egg would be done in a day, but in reality it took a few days, due to a number of iterations that were necessary as I discovered a series of features around the lesser known (well to me) parts of  django. So I thought I would share these below, in case, any of the tips I gained are useful ...

  1. The request object life cycle goes through three main states in django:
    unpopulated - the request that is around at the time of process_request type middleware hooks - before it gets passed by the URL handler to decorators and then views.
    partly populated - the request that has session, user and other data added to it (mainly by decorators) and gets passed to a view
    fully populated - the request that has been passed through the view to add its data, and is used to generate a response - this is the one that process_response sees.
  2. I needed to identify requests that were decorated with my no_cookies decorator at the time of process_request. But the flag it sets has not be set yet. However there is a useful utility to work around this, django.core.urlresolvers.resolve, which when passed a path, gives a match object containing the view function to be used, and hence its decorators, if any.
  3. Template Tags that use a request get the unpopulated one by default.  I needed request to have the session populated for the option of adding manual session tags - see the tags code, to have the partly populated request in tags django.core.context_processors.request must be added to the TEMPLATE_CONTEXT_PROCESSORS in settings.

  4. The django test framework's test browser is in effect a complex mocking tool to mock up the action of a real browser, however like any mock objects - it may not exactly replicate the behaviour one desires. In my case it only turns on session mocking if it finds the standard django session middleware in settings. In the case of cookieless it isn't there, because cookieless acts as a replacement for it, and a wrapper to use it for views undecorated with no_cookies. Hence I needed to use a trick to set a TESTING flag in settings - to allow for flipping cookieless on and off.

Tuesday, 23 October 2012

My struggles with closed source CMS

I produce a content migration tool for the Plone CMS, called ilrt.contentmigrator. It wraps up zope's generic setup as an easily configurable tool to export content from zope's bespoke object database, the ZODB, to a standard folder hierarchy of content with binary files and associated metadata in XML.

Some time ago I added a converter to push content to Google Sites, and I have recently been tasked with pushing it to a commercial CMS. Unfortunately rather than a few days work as before this has turned into a marathon task, which I am still unsure as to whether it is achievable, due to political and commercial constraints.

So I thought I should at least document my progress, or lack of, as a lesson for other naive open source habituated developers, to consider their estimates carefully when dealing with a small closed source code base, of which they have no experience.

Plan A - Use the API


So the first approach I assumed would be the simplest was to directly code a solution using "the API".

API is in quotes here, since in common with many small commercial software suppliers, the name API was in fact referring to an automated JavaDoc dump of all their code, there was no API abstraction layer, or external RESTful / SOAP API, to call. Its basically the equivalent of read the source for open source projects - but with the huge disadvantage of only legally having access to read the bare - largely uncommented - class and method names, not look at the source to see how they worked - or why they didn't.

Also no other customers had previously attempted to write code against the content creation part of the code base.

Anyhow back to the project, content import code was written and run, but nothing changed via the web front end.

It turns out that without a cache refresh the Admin interface does not display anything done via the API, hence it is essential to be able to determine if changes have occurred.

Similarly if content is not cleared from the waste-basket then it cannot be recreated in the same location, along the lines of a test import scenario.

Having written the code to set up the cache and other API managers and clear it. I discover that
cache refresh doesn't work via the API, neither does clearing the waste basket.

The only suggested solution was turn the CMS off and on again.

Plan B - Use the API and a Robot


Rather than resort to such a primitive approach, I decided to develop a Selenium, web driver based robot client. This could log into the CMS and run all the sequences of screen clicks that it takes to clear the waste-basket and cache after an API delete has been called.

Eventually all this was in place, now content could be created via the API, and media loaded via the robot (since again anything that may use local file system caches or file storage, is inoperable via the API).

The next stage was to create the folder hierarchy and populate it with content.

Unfortunately at this point a difficult to trace API bug reared its head. If a subfolder is created in a folder via the API, then it gets created in a corrupted manner, and block subsequent attempts to access content in that folder, because the subsection incorrectly registers itself as content - but is then found to be missing. After much time spent tracing this bug, the realisation dawned that, it would therefore not be viable to create anything but a subset of content objects via the API, and everything else would need the robot being mixed in to work.

This seemed like a much less maintainable solution. Especially since most pages of the CMS had 50 or more javascript files linked to them, so only a current browser WebDriver client robot would function at all with it. Even then, often the only way to get the robot clicks and submits to work was to grab the javascript calls out from the source and call the jQuery functions directly with the WebDriver javascript engine.

Plan C - Use the import tool and a Robot


So having wasted 3 days tracing that a bug was in the (closed source) API, it was time to take a step back and think about whether there was realistically a means by which an import tool could be created, by a developer outside of the company supplying the CMS, ie. me.

Fortunately the CMS already had an XML export / import tool. So all we need to do is convert our XML format to the one used by the company, and the rest of the code was their responsibility to maintain.

At first their salesman seemed fine with this solution. So I went away and started on that route. Having left the existing code at the point where the sub-folder creation API bug, blocks it working.

However on trying out the CMS tool, it also failed to work in a number of ways. The problems that it currently has are listed below, and my focus is presently writing a selenium based test suite, that will perform a simple folder, content and media export and import with it.

Once the tool passes, then we have confirmation that the API works (at least within the limited confines of its use within the tool). We can then write a converter for the XML format and driver for the tool / or even revisit the API + robot route, if its fixed.

Below are the issues, that need to work, and that the test suite is designed to confirm are functional ...

Content Exporter Issues (status in brackets)

  1. The folder hierarchy has to be exported separately from the content. If you select both at once - it throws an error (minor - run separately)
  2. The hierarchy export appends its data when exported to the same folder, so creating an invalid hierarchy.xml after the first run (minor - could delete on the file system in between) 
  3. Hierarchy export doesn't work. It just creates XML with the top level folder title wrapped in tags containing the default configuration, attributes - but no hierarchy of folders. (blocker - need the hierarchy especially to work, since the sub-folder creation was the blocking bug issue with using the API directly)
  4. Content export only does one part of one page at a time, ie. a single content item (minor - this means that it is not a very useful export tool for humans - however via a robot - it could be run hundreds of times to get a folder done)
  5. The embedded media export doesn't work, no media is created (blocker - we need to be able to do images and files)
  6. Content import - A single content item works - and if the media already exists with the right id, that works. Cant judge about media import - since media export fails so have not got a format to follow. (blocker - we need media to work as well as a minimum. Ideally we could import all the parts of a page in one go - or even more than one page, at once!)
  7. Hierarchy import - Creating a single section works. Cannot judge for subsections - since the export doesn't work. (pass?)
  8. Configuration changes can break the tool (blocker - the whole point of the project is to provide a working tool for a phased transition of content, it should work for a period of at least two years)
  9. Not sure if the tool can cope with anything but default T4 metadata (minor - A pain but the metadata changes to existing content are part of the API that should function OK directly, so could be done separately to the tools import of content.)
Once we have a consistently passing CMS tool, we can assess the best next steps.

The testing tool, has proved quite complex to create too, because of the javascript issues mentioned above, but this now successfully tests the run of an export of a section and a piece of content, checking the exported XML file, also running the import for these to confirm the functionality is currently at the level listed above.

Having been burnt by my experience so far, my intention is to convert the Plone export XML and files to the new CMS native XML format - push it to the live server and run the robot web browser to trigger its import, so that eventually we will have a viable migration route - as long as the suppliers ensure that their tool (and underlying API) are in working order.















Monday, 9 July 2012

Review of talks I attended or was recommended at Europython 2012

This is a quick jotting down of recommendations with video links related to this years Europython.
It is really intended for other developers in my workplace.
But in case it has wider relevance I have posted it to my blog. Appologies for the rough format - and remember that with 5 tracks and 2 trainings, I only got exposed to a small portion of the talks.

YouTube channel of all talks

Inspiring / Interesting talks


Permission or forgiveness
Linking women programmers, and the approach of Grace Hopper, inventor of the compiler to the wider implications of her approach. To enable creativity in an organisation, the rules, that ensure its survival, must be broken. Since middle management's inevitable behaviour will default to blocking permission for innovations, just ignore them wrt. anything technical, for the greater good!

Music Theory - Genetic algorithms and Python
Fun and enthusiastic use of Python to rival the masters of classical music!

State of python
So general view of dyamic langs being on the up ruby, js etc.
Seems that static typing snobbery is Guido's bugbear.
Increase in popularity shown by 800 at ep12, compared to 500 at ep10
Then bunch of internal python language decisions stuff, and dealing with trolls


Stop censorship with Python
Tor project used to allow uncensored internet in China, etc.

The larch environment
Every wanted to write code with pictures rather than boring old text?
Pretty amazing what this PhD student has put together.

Aspect orientated programming
Possibly inspire you to stretch a paradigm as far as it will go (even to breaking point?).

SW Design and APIs

Scaling or deployment (for Django)

Django under massive loads
Good coverage of scaling django especially wrt. running on Postgresql. Coverage of classic issues wrt. performance and the django ORM. So for example using slice with a queryset always loads the whole queryset into memory.

 How to bootstrap a startup with Django
Coverage of the standard add on components for a medium scale django deployment

 

Have released geordi, django-dogslow and interruptingcow to handle issues.
  • geordi provides a URL based means of getting full PDF profiling reporting back for pages
  • dogslow does monitoring and email reporting of hotspots with traceback from live systems.
  • interruptingcow allows setting of nested timeout levels for doing expensive over lighter operations for the web server

Spotify question session
Useful insights into scaling - particularly for a large scale Python applications using Cassandra.

Need to be careful with compaction routine.
So half load capacity due to it making spikes. So it has to sometimes jump if overloaded
dont see error - to fix this last percentage is very hard. Instead go for pretend its working approach. Just retry to catch failiures. Cassandra - dont upgrade .8 to 1.0 
NB: Employed oracle guys who worked on JVM to fix some of cassandras issues - well JVM/Cassandra bugs it revealed at load!

What I learned from big web app deployments - how to scale application deployments (zope particularly)

concurrent.futures
Concurrent programming made easy. Example was bulk processing of a big Apache log. Ditch the old separate multithread or multiprocessor libraries for this python 3 (and backport 2) package.

Language approaches

Using descriptors - useful to know some of the use cases for these language features
PyPy JIT - a look under the hood now I know that RPython is not R + Python, but restricted Python.
PyPy keynote - coverage of current activity in pypy development

Big Data / Mining 

pandas and pytables - amazing simple API to mine data (in this case financial)

Testing

Useful insights into testing
Set up included jenkins, nose etc. Run tests concurrently to speed up test suite.
Note that selenium was painful for them - Far too brittle!

The presenter said this may be a little basic level for me - and a bit crowded so I went to other stuff in the end.

Other talks I attended

  1. Let your brain talk to computers
  2. Ask your BDFL
  3. Becoming a better programmer
  4. NDB: The new App Engine datastore library
  5. Advanced Flask patterns (cancelled)
  6. Big A little I (AI patterns in Python)
  7. Increasing women's engagement in Python
  8. Minimalism in Software development
  9. The integrators guide to duct taping
  10. Guidelines to writing a Python API
  11. Composite key is ready for django 1.4
  12. Heavybase: A clinical peer-topeer database
  13. Beyond clouds: Open source edge computing with SlapOS
  14. Creating federated authorisation for a Django survey system (my talk!)

Saturday, 7 July 2012

Europython 2012 - hot stuff

Being English I often tend to start a new conversation with the weather. At Europython this week I had  good reason, it was hot 30 - 35 degrees centrigrade. Whilst at home the UK has been bathed with ... rain and temperatures of 20 at the most. Of course for Florentines it is only a couple of degrees above the norm, so nothing worth talking about. However they were polite enough to respond to this, or any other opening conversational gambit I offered, and in general I found Europython to be a very social event this year, in terms of meeting new people, probably more so than any previous conference I have been to for Python or its frameworks.

At this year's conference I attended on my own, and hence I made a bit more of an effort to be sociable. This along with luckily getting a poster session (that could help justify work sending me!), were prompts to try and start conversations where I may normally have been more reticent.

The conference itself has a great atmosphere for mixing in any case. With possibly four main themes. Core language and implementation developers. Web applications. Data mining and processing. Configuration management and automation tools. Of course within these there are divisions, the investment banking iPython analysers vs. the applied science academic researchers. Or Pyramid vs. Django, etc., but it seems everyone can usefully share ideas, whether they are sales engineers from a large company or a pure hobbyist.

This inclusiveness was also a theme in itself, particularly wrt. women. Kicking off with Alex Martelli's keynote about the inventor of the compiler, along with a lot of other stuff, Grace Hopper.
Unfortunately they are under represented in the coding sector, at work I think its around 20% for programmers, but even that is higher than the average - probably because we are public sector / unionised. This is reflected by much lower membership of our local DBBUG Django group who are mainly drawn from the commercial sector, with only 2 out of around 50 active members. Europython was as bad, at 4% last year, but that has doubled this year to around 60 of the 750 attendees.

Returning to Python themes. The chance to chat to the data miners was most useful since we are currently in a state of transition at work. Having been involved in internal systems, particularly CMS from the days when it was evolving 10 years ago, we are now moving to a more pure R&D role.
This means that CMS work is to be dropped and whilst we want to continue large custom web application work related to research (thats where my poster session on our Django survey system comes in). We also want to be moving towards work that ties up with the University's applied science research - especially big data mining and the like.
So for me the chance to talk (and listen) to people across a range of disciplines was ideal.

Lastly, I also realised how stale my knowledge of the the new features of the language are. Time to get a book on Python 3 - and get back on track I think.  Oh and of course many thanks to the Italian Python community and conference organisers for a really great conference - and more than my fare share of free cocktails - which certainly helped break the ice.



Saturday, 16 June 2012

Talks vary, especially mine

This week I had the opportunity to go to a couple of gatherings and deliver two python related talks. The first was at our local Django Bath and Bristol Users Group (#DBBUG). The second was at the Google Apps for EDU European User Group (GEUG12).

I don't do a talk that often, maybe 5 or 6 times a year, and I guess share some of the traits of the stereotyped geek that I am not a natural extrovert who wants to be up on stage. But at the same time, there is a bit of an adrenalin rush and subsequent good feeling (maybe just relief its over!) if a talk seems to go OK.

The first talk went well, although perhaps that was just from my perspective, having consumed a generous quantity of the free beer before hand, provided by the hosters, Potato . It was good to see a lot of new faces at the meeting, and hear about interesting projects from other speakers.
My talk was hardly rocket science, instead just a little session on the basics of python packaging and why its a good thing to do it. But it seemed to me it was pitched at about the right technical level for the newbies to get something from it, and the more experienced to contribute comments. It was paced about right and I pretty much followed the thread of the slides ie. from 'what is an egg' to C.I. and local PyPi, without slavishly reading out from them. The atmosphere was relaxed and people seemed interested. All in all a good session.

The second talk unfortunately did not go well, even though I spent more time preparing it - it even triggered me into upgrading my personal App Engine site (see previous blog posting) - which as described took a good few hours in itself - to make sure my App Engine knowledge was up to date. So what went wrong. Well maybe with the previous talk going well with little preparation and no rehearsal - I had started to get a bit blase. Hey look at me, I can fire off a talk no problem, don't worry about it. However to some extent the quality of any talk depends as much on the audience as the speaker - its an interactive process - and for a few reasons, I didn't feel that I had established that communication - and it threw me, I guess, to the point where I was really stumbling along at the end.

So I thought I would try and analyse some of the reasons, to try to avoid it happening again. These are all common sense and probably in any guide to public speaking - but I thought its worth writing it down - even if its only for my benefit!

The core reason was that I assumed that there was a disconnect between the audience, and what I was talking about. So I wasn't preaching the gospel to the humanist society - or zope at a django conference ;-) I was talking about how you can use App Engine as a CMS, to a group of University educational technology staff - managers, learning support and some developers.

So first mistake was that I had adapted a talk that I had delivered to a group of developers a year before. It had gone well before because the audience, like me, were not interested in using the tools - they were interested in building the tools, how these tools could be used to build others - and what implications that had for how we should approach building tools in the future.

Lesson 1: Try to get an idea of  your audience's background - then write the talk tailored for them from scratch (even if its a previous talk - unless you know the audience profile hasn't changed). Also if a previous demo and talk with questions had been an hour - and now it has to be done in 20 minutes - rewrite it - or at least delete half the slides - but don't expect to talk three times faster!

Lesson 2: If you do feel that you might of pitched a talk at the wrong technical level - and there is no time to rewrite or rethink it, its probably best to just deliver it as it stands. Moderating all the slides with 'sorry too techie' and rephrasing things in layman's terms on the fly - is probably going to be less coherent, and lose the thread of the talk anyhow - unless you are a well experienced teacher.

My first slide was entitled APIs in transition - hmmm that was a mistake, a few people immediately left the room.

Lesson 3: The most interesting initial thing to me coming back to my site, were all the changes that had occurred with the platform. However if you haven't used it before that is irrelevant. So remember don't focus on what you last found interesting about the topic - focus on the general picture for somebody new to it.

Lesson 4: Don't start a talk with the backend technology issues. Start it with an overview of the purpose of the technology and ideally a demo or slide of it in use. However backend your topic, its always best to start with an idea of it from the end user perspective - even when talking to a room full of developers.

When I got to the demo part I skipped it due to feeling time pressure - actually however this would of been best acting as the main initial element of the talk, with all the technical slides skipped and just referred to for those interested. Finishing with the wider implications regarding sites based around mash-ups, driven by a common integration framework. So ditching most of the technical details.

Lesson 5: Don't be scared to reorganise things at the last minute, before starting (see Lesson 2) - if that reorganisation is viable, eg. in terms of pruning and sequence.


There was a minor organisational issue in that I started 5 minutes before I was due to end a 20 minute talk, with no clock to keep track. So there was a feeling of having over run almost from the start, combine that with people leaving or even worse people staying but staring at you with a blank, 'You are really boring' look!

Lesson 6: Check what time you are really expected to end before you start and get your pace right based on this. Keep looking around the audience and try to just find a least some people who look vaguely interested! - ignore the rest - it is unlikely you can ever carry the whole audience's interest - unless you are a speaker god - but you need to feel you have established some communication with at least some members of it - to keep your thread going.


OK well I could go on about a number of other failings ... but hey, I have spent longer writing about it than I did delivering it. So that will do, improve by learning from my mistakes, and move on.

User groups vary too

As a footnote another difference was the nature of the two user groups.

The DBBUG group was established and is entirely organised by its members, like minded open source developers, who take turns organising the meetings, etc. Its really just an excuse for techies to go to the pub together on a regular basis - and not necessarily always chat about techie stuff. Its open to anyone and completely informal.

GEUG is also largely organised by its members taking turns, but was originally established by Google's  HE division for Europe and has a lot of input from their team, it requires attendees to be members of customer institutions. So essentially its a customer group and has much more of that feel. Members attend as a part of their job. Google's purpose is to use it to expand its uptake in HE - by generating a self supporting community that promotes the use of its products, and trials innovative use cases. To some extent feeding back into product development. With a keynote by Google and all other talks by the community members. Lots of coloured cubes, pens, sports jackets - and a perhaps, slightly rehearsed, informality. But interestingly quite open to talks that perhaps didn't praise their products or demonstrated questionable use cases, regarding the usual bugbear of data protection. Something that is a real sore spot within Europe apparently - the main blocker to cloud adoption in HE.

Having once attended a Microsoft user group event in Dublin at the end of the 90s, I would say that this was a long way removed from that. The Microsoft event was strictly controlled, no community speakers, nothing but full sales engineer style talks about 'faultless' products, there was no discussion  of flaws or even of approaches that could generate technical criticisms. Everybody wore suits - maybe that is just the way software sales were way back when Microsoft dominated the browser and desktop.

Where as now community is where it is at. Obviously GEUG felt slightly less genuinely community after DBBUG, but I would praise Google that they are significantly less controlling over shaping a faultless technical suited business face to HE, than some of their competitors. Unfortunately for them non-technical managers with their hands on the purse strings tend to be largely persuaded by the surface froth of suits and traditional commercial software sales - disguise flaws, rather than allow discussion of whether and how they may be addressed.

In essence the Google persona that is carefully crafted to sit more towards an open source one, but as a result may suffer from the same distrust that traditional non-technical clients have for open source over commercial systems. Having said that they are not doing too badly ... dominating cloud use in US HE. Maybe Europe HE is just a tougher old nut to crack.






Tuesday, 5 June 2012

Upgrading a Google App Engine Django app

In my day to day work, I haven't had an opportunity to use Google App Engine (GAE). So to get up to speed with it, and since it offers free hosting for low usage, I created my home site on the platform a year ago. The site uses App Engine as a base, and integrates in Google Apps for any content other than custom content types.

Recently I have been upgrading the Django infrastructure we use at work, from Python 2.6 to 2.7 and Django 1.3 to 1.4. I thought having been a year, it was probably time to upgrade my home site too, having vaguely registered a few changes in App Engine being announced. On tackling the process, I realised that 'a few changes' is an understatement.

Over the last year GAE has moved from a beta service to a full Google service. Its pricing model has changed, its backend storage has changed, the python deployment environment has changed and the means of integrating the Django ORM has changed. A raft of other features have also been added. So what does an upgrade entail?

Lets start with where we were in Spring 2011

  1. Django 1.2 (or 0.96) running on a Python 2.5 single threaded CGI environment
  2. The system stores data in the Master/Slave Big Table datastore
  3. Django's standard ORM django.db is replaced by the NOSQL google.appengine.ext.db *
  4. To retain the majority of forms functionality ext.djangoforms replaces forms *
  5. python-gdata is used as the standard means to integrate with Google Apps via common RESTful Atom based APIs

    * as an alternative django-norel could of been used to provide full standard ORM integration - but this was overkill for my needs
To configure GAE a typical app.yaml would of been the following:

application: myapp
version: 1-0
runtime: python
api_version: 1

handlers:
- url: /remote_api
  script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
  login: admin

- url: /.*
  script: main.py

- url: /media
  static_dir: _generated_media
  secure: optional 
With the main CGI script to run it

import os

from google.appengine.ext.webapp import util
from google.appengine.dist import use_library

use_library('django', '1.2')
os.environ['DJANGO_SETTINGS_MODULE'] = 'edcrewe.settings'

import django.core.handlers.wsgi
from django.conf import settings

# Force Django to reload its settings.
settings._target = None

def main():
  # Create a Django application for WSGI.
  application = django.core.handlers.wsgi.WSGIHandler()

  # Run the WSGI CGI handler with that application.
  util.run_wsgi_app(application)

if __name__ == '__main__':
  main()

What has changed over the last year

Now we have multi-threaded WSGI Python 2.7 with a number of other changes.

  1. Django 1.3 (or 1.2) running on a Python 2.7 multi threaded WSGI environment
  2. The system stores data in the HRD Big Table datastore
  3. For Big Table the NOSQL google.appengine.ext.db is still available, but Django's standard ORM django.db is soon to be available for hosted MySQL
  4. google.appengine.ext.djangoforms is not available any more
    Recommendation is either to stop using ModelForms and hand crank data writing from plain Forms - or use django-norel - but it does have a startup overhead *
  5. python-gdata is still used but it is being replaced by simpler JSON APIs specific to the App in question, managed by the APIs console and accessible via google-api-python.

    * django-norel support has moved from its previous authors - with the Django 1.4 rewrite still a work in progress
Hmmm ... thats a lot of changes - hopefully now we are out of beta - there won't be so many in another year's time! So how do we go about migrating our old GAE Django app.

Migration

Firstly the Python 2.7 WSGI environment requires a different app.yaml and main.py 
Now to configure GAE a typical app.yaml would be:
application: myapp-hrd
version: 2-0
runtime: python27
api_version: 1
threadsafe: true

libraries:
- name: PIL
  version: latest
- name: django
  version: "1.3"

builtins:
- django_wsgi: on
- remote_api: on

handlers:
# Must use threadsafe: false to use remote_api handler script?
#- url: /remote_api
#  script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
#  login: admin

- url: /.*
  script: main.app

- url: /media
  static_dir: _generated_media
  secure: optional 
With the main script to run it just needing...
import os
import django.core.handlers.wsgi

os.environ['DJANGO_SETTINGS_MODULE'] = 'edcrewe.settings'
app = django.core.handlers.wsgi.WSGIHandler()
But why is the app-id now myapp-hrd rather than myapp?
In order to use Python 2.7 you have to move to the HRD data store. To migrate the application from the deprecated Master/Slave data store it must be replaced with a new application. New applications now always uses HRD.
Go to the admin console and 'Application Settings' and at the bottom are the migration tools. These wrap up the creation of a new myapp-hrd which you have to upload / update the code for in the usual manner. Once you have fixed your code to work in the environment (see below) - upload it.
The migration tool's main component is for pushing data from the old to the new datastore and locking writes to manage roll over. So assuming all that goes smoothly you now have a new myapp-hrd with data in ready to go, which you can point your domain at.

NB: Or you can just use the remote_api to load data - so for example to download the original data to your local machine for loading into your dev_server:
${GAE_ROOT}/appcfg.py download_data --application=myapp
--url=http://myapp.appspot.com/remote_api --filename=proddata.sql3

${GAE_ROOT}/appcfg.py upload_data --filename=proddata.sql3 
${MYAPP_ROOT}/myapp --url=http://localhost:8080/remote_api 
--email=foo@bar --passin --application=dev~myapp-hrd

Fixing your code for GAE python27 WSGI

Things are not quite as straight forward as you may think from using the dev server to test your application prior to upload. The dev server's CGI environment no longer replicates the deployed WSGI environment quite so well - like the differences between using Django's dev server and running it via Apache mod_wsgi. For one thing any CGI script imports OK as before for the dev server - yet may not work on upload or require config adjustments - e.g. ext.djangoforms is not there, and use of any of the existing utility scripts - such as the remote_api script for data loading, requires disabling of the multi-threaded performance benefits. Probably the workaround here for more production scale sites than mine, is to have a separate app for utility usage than the one that runs the sites.

If you used ext.djangoforms, either you have to move to django-norel or do code writes directly. For my simple use case I wrote a simple pair of utility functions to do data writes for me, and switched my ModelForms to plain Forms.


def get_ext_db_dicts(instance):                                 
    """ Given an appengine ext.db instance return dictionaries
        for its values and types - to use with django forms
    """                                               
    value_dict = {}                                                
    type_dict = {}                                              
    for key, field in instance.fields().items():                                                 
        try:                                                                      
            value_dict[key] = getattr(instance, key, '')                                                
            type_dict[key] = field.data_type                                                          
        except:                                                                   
            pass                                                        
    return value_dict, type_dict      

def write_attributes(request, instance):
    """ Quick fix replacement of ModelForm set attributes
        TODO: add more and better type conversions
    """
    value_dict, type_dict = get_ext_db_dicts(instance)                                                                                                 
    for field, ftype in type_dict.items():                                                                                                             
        if request.POST.has_key(field):                                                                                                                
            if ftype == type(''):                                                                                                                      
                value = str(request.POST.get(field, ''))                                                                                               
            elif ftype == type(1000):                                                                                                                  
                try:                                       
                    value = int(request.POST.get(field, 0))                             
                except:         
                    value = 0                                                                                                                
            elif ftype == type([]):                              
                value = request.POST.getlist(field)                            
            else:                                                              
                value = str(request.POST.get(field, ''))                                                                                                                                                                   
            setattr(instance, field, value)

Crude but it allows one line form population for editing ...
mycontentform = MyContentForm(value_dict)
... and instance population for saving ...
write_attributes(request, instance)
However even after these fixes and data import, I still had another task. Images uploaded as content fields were not transferred - so these had to be manually redone. This is maybe my fault for not using the blobstore for them - ie since they were small images they are were just saved to the Master/slave data store - but pretty annoying even so.

Apps APIs

Finally there is the issue of gdata APIs being in a state of flux. Well currently the new APIs don't provide sufficient functionality and so given that this API move by Google still seems to be in progress - and how many changes the App Engine migration required - I think I will leave things be for the moment and stick with gdata-python ... maybe in a years time!

Tuesday, 27 March 2012

Django - generic class views, decorator class

I am currently writing a permissions system for a Django based survey application and we wanted a nice clean implementation for testing users for appropriate permissions on the objects displayed on a page.

Django has added class views in addition to the older function based ones. Traditionally tasks such as testing authorisation has been applied via decorator functions.

 @login_required
 def my_view:  
   return response  

The recommended approach to do this for a class view is to apply a method decorator.
There is a utility convertor method_decorator, to do this for function decorators.

 class ProtectedView(TemplateView):
    @method_decorator(login_required)
    def dispatch(self, *args, **kwargs):
        return super(ProtectedView, self).dispatch(*args, **kwargs)

However this isn't ideal since it is less easy to check all the methods for decorators than just look at the top of the view function as before.

So why not use a class decorator instead to make things more clear. Fine, except we do actually want to decorate the dispatch method. But we can add a utility decorator that wraps this up.*

@class_decorator(login_required, 'dispatch')
class ProtectedView(TemplateView):
    def dispatch(self, *args, **kwargs):
        return super(ProtectedView, self).dispatch(*args, **kwargs)

But Django's generic class views contain more than just the TemplateView, they have generic list, detail and update views. All of which use a standard pattern to associate object(s) in the context data. Not only that but the request will also have user data populated if the view requires a login.

What I want to do is have a simple decorator that just takes a list of permissions, then ensures users who access the class view must login and then have each of these object permissions checked for the context data object(s). So my decorator for authorising user object permissions will be @class_permissions('view', 'edit', 'delete')

To do this the class_permissions decorator itself, is best written as a class. The class can then combine the actions of three method decorators on the two Django generic class view methods - dispatch and get_context_data.
Firstly login_required wraps dispatch - then dispatch_setuser wraps this to set the user that login_required delivers as an attribute of the class_permissions class.
These must decorate in the correct order to work.

Finally class_permissions wraps get_context_data to grab the view object(s). The user, permissions and objects can now all be used to test for object level authorisation - before a user is allowed access to the view. The core bits of the final code are below - my class decorator class is done :-)


class class_permissions(object):
    """ Tests the objects associated with class views
        against permissions list. """
    perms = []
    user = None
    view = None
 
    def __init__(self, *args):
        self.perms = args

    def __call__(self, View):
        """ Main decorator method """
        self.view = View

        def _wrap(request=None, *args, **kwargs):
            """ double decorates dispatch 
                decorates get_context_data
                passing itself which has the required data                                              
            """
            setter = getattr(View, 'dispatch', None)
            if setter:
                decorated = method_decorator(
                                   dispatch_setuser(self))(setter)
                setattr(View, setter.__name__,
                        method_decorator(login_required)(decorated))
            getter = getattr(View, 'get_context_data', None)
            if getter:
                setattr(View, getter.__name__,
                        method_decorator(
                               decklass_permissions(self))(getter))
            return View
        return _wrap()


The function decorators and imports that are used by the decorator class above

from functools import wraps
from django.utils.decorators import method_decorator

def decklass_permissions(decklass):
    """ The core decorator that checks permissions """                                                                                                      

    def decorator(view_func):
        """ Wraps get_context_data on generic view classes """ 
        @wraps(view_func, assigned=available_attrs(view_func))
        def _wrapped_view(**kwargs):
            """ Gets objects from get_context_data and runs check """                                          
            context = view_func(**kwargs)
            obj_list = context.get('object_list', [])                                                      
            if not obj_list:     
                obj = context.get('subobject',                                          
                                  context.get('object', None))                          
                if obj:                                                                 
                    obj_list = [obj, ]       
            check_permissions(decklass.perms, decklass.user, obj_list)
            return context 
        return _wrapped_view
    return decorator

def dispatch_setuser(decklass):
    """ Decorate dispatch to add user to decorator class """

    def decorator(view_func):
        @wraps(view_func, assigned=available_attrs(view_func))
        def _wrapped_view(request, *args, **kwargs):
            if request:
                decklass.user = request.user
            return view_func(request, *args, **kwargs)
        return _wrapped_view
    return decorator


Although all this works fine, it does seem overly burdened with syntactic sugar. I imagine there may be a more concise way to achieve the results I want. If anyone can think of one, please comment below.

* I didnt show the code for class_decorator since it is just a simplified version of the class_permissions example above.

Sunday, 12 February 2012

Finally I have left behind the printed word

The printing press was invented by Gutenberg getting on for 600 years ago, as a technology the printed book had a good innings compared to the cassette tape or the floppy disk. But maybe its time to recognize that at least as far as text only mass paperback printed media goes, its unlikely to reach that 600th birthday. So I finally decided to stop being a Luddite and buy an e-reader this Christmas, along with 1.3 million other people in Britain. I actually bought it for my partner but I have ended up hogging it.

So what's so great about an e-reader?, well its a good excuse to get back into the classics, anything out of copyright is available free. Over the last month I managed to plough through a book a piece by George Orwell, Thomas Hardy, Charles Dickens, Joseph Conrad and Emily Bronte. But in addition to that any book is available instantaneously either from repositories or Amazon and its competitors. Or for the less scrupulous there is the twilight world of file sharing networks - where all e-books are available for free.

Once the habit is formed of reading via a Kindle or other e-reader it quickly becomes just as natural as turning pages - I prefer the cheaper LCD grey scale readers without touch screen or keyboard - they are the simplest direct replacement for a paperback, and if current price wars continue will soon cost about the same as a hardback! For that you get wireless, space for 1500 books and a month battery life.

So you may be thinking - OK big deal - but why are you talking about this on a python blog? Well the reason is I fancied reading other digital text formats on my e-reader - what if I want a copy of a pdf, some sphinx docs or even a whole website for example. I soon came across the open source Calibre software, written in Python and C and available on linux, Mac and Windows. Kovid Goyal may not be Gutenberg - but he has certainly produced a really good e-book management software package.

Once you have installed the software it sets up that machine as your personal e-book library, to which you can add pretty much any text format you wish, along with news feeds or other HTML content. Plug in your e-reader then press the convert e-book button to translate them to .mobi or whatever, and another to send to the device. BeautifulSoup is used to help with HTML conversion and there is an API for creating conversion recipes. The software also includes a server to make your personal e-library available to you over the web, book repository browsers, synchronisation tools, handling of annotations, etc.

Friends talk about missing the tactile experience, the smell, but with a good e-reader and good e-book management software - I can't really justify lugging around those funny old glued together sheets of  bleached wood pulp any more ;-)