Ed Crewe: June 2012

Saturday 16 June 2012

Talks vary, especially mine

This week I had the opportunity to go to a couple of gatherings and deliver two python related talks. The first was at our local Django Bath and Bristol Users Group (#DBBUG). The second was at the Google Apps for EDU European User Group (GEUG12).

I don't do a talk that often, maybe 5 or 6 times a year, and I guess share some of the traits of the stereotyped geek that I am not a natural extrovert who wants to be up on stage. But at the same time, there is a bit of an adrenalin rush and subsequent good feeling (maybe just relief its over!) if a talk seems to go OK.

The first talk went well, although perhaps that was just from my perspective, having consumed a generous quantity of the free beer before hand, provided by the hosters, Potato . It was good to see a lot of new faces at the meeting, and hear about interesting projects from other speakers.
My talk was hardly rocket science, instead just a little session on the basics of python packaging and why its a good thing to do it. But it seemed to me it was pitched at about the right technical level for the newbies to get something from it, and the more experienced to contribute comments. It was paced about right and I pretty much followed the thread of the slides ie. from 'what is an egg' to C.I. and local PyPi, without slavishly reading out from them. The atmosphere was relaxed and people seemed interested. All in all a good session.

The second talk unfortunately did not go well, even though I spent more time preparing it - it even triggered me into upgrading my personal App Engine site (see previous blog posting) - which as described took a good few hours in itself - to make sure my App Engine knowledge was up to date. So what went wrong. Well maybe with the previous talk going well with little preparation and no rehearsal - I had started to get a bit blase. Hey look at me, I can fire off a talk no problem, don't worry about it. However to some extent the quality of any talk depends as much on the audience as the speaker - its an interactive process - and for a few reasons, I didn't feel that I had established that communication - and it threw me, I guess, to the point where I was really stumbling along at the end.

So I thought I would try and analyse some of the reasons, to try to avoid it happening again. These are all common sense and probably in any guide to public speaking - but I thought its worth writing it down - even if its only for my benefit!

The core reason was that I assumed that there was a disconnect between the audience, and what I was talking about. So I wasn't preaching the gospel to the humanist society - or zope at a django conference ;-) I was talking about how you can use App Engine as a CMS, to a group of University educational technology staff - managers, learning support and some developers.

So first mistake was that I had adapted a talk that I had delivered to a group of developers a year before. It had gone well before because the audience, like me, were not interested in using the tools - they were interested in building the tools, how these tools could be used to build others - and what implications that had for how we should approach building tools in the future.

Lesson 1: Try to get an idea of your audience's background - then write the talk tailored for them from scratch (even if its a previous talk - unless you know the audience profile hasn't changed). Also if a previous demo and talk with questions had been an hour - and now it has to be done in 20 minutes - rewrite it - or at least delete half the slides - but don't expect to talk three times faster!

Lesson 2: If you do feel that you might of pitched a talk at the wrong technical level - and there is no time to rewrite or rethink it, its probably best to just deliver it as it stands. Moderating all the slides with 'sorry too techie' and rephrasing things in layman's terms on the fly - is probably going to be less coherent, and lose the thread of the talk anyhow - unless you are a well experienced teacher.

My first slide was entitled APIs in transition - hmmm that was a mistake, a few people immediately left the room.

Lesson 3: The most interesting initial thing to me coming back to my site, were all the changes that had occurred with the platform. However if you haven't used it before that is irrelevant. So remember don't focus on what you last found interesting about the topic - focus on the general picture for somebody new to it.

Lesson 4: Don't start a talk with the backend technology issues. Start it with an overview of the purpose of the technology and ideally a demo or slide of it in use. However backend your topic, its always best to start with an idea of it from the end user perspective - even when talking to a room full of developers.

When I got to the demo part I skipped it due to feeling time pressure - actually however this would of been best acting as the main initial element of the talk, with all the technical slides skipped and just referred to for those interested. Finishing with the wider implications regarding sites based around mash-ups, driven by a common integration framework. So ditching most of the technical details.

Lesson 5: Don't be scared to reorganise things at the last minute, before starting (see Lesson 2) - if that reorganisation is viable, eg. in terms of pruning and sequence.

There was a minor organisational issue in that I started 5 minutes before I was due to end a 20 minute talk, with no clock to keep track. So there was a feeling of having over run almost from the start, combine that with people leaving or even worse people staying but staring at you with a blank, 'You are really boring' look!

Lesson 6: Check what time you are really expected to end before you start and get your pace right based on this. Keep looking around the audience and try to just find a least some people who look vaguely interested! - ignore the rest - it is unlikely you can ever carry the whole audience's interest - unless you are a speaker god - but you need to feel you have established some communication with at least some members of it - to keep your thread going.

OK well I could go on about a number of other failings ... but hey, I have spent longer writing about it than I did delivering it. So that will do, improve by learning from my mistakes, and move on.

User groups vary too

As a footnote another difference was the nature of the two user groups.

The DBBUG group was established and is entirely organised by its members, like minded open source developers, who take turns organising the meetings, etc. Its really just an excuse for techies to go to the pub together on a regular basis - and not necessarily always chat about techie stuff. Its open to anyone and completely informal.

GEUG is also largely organised by its members taking turns, but was originally established by Google's HE division for Europe and has a lot of input from their team, it requires attendees to be members of customer institutions. So essentially its a customer group and has much more of that feel. Members attend as a part of their job. Google's purpose is to use it to expand its uptake in HE - by generating a self supporting community that promotes the use of its products, and trials innovative use cases. To some extent feeding back into product development. With a keynote by Google and all other talks by the community members. Lots of coloured cubes, pens, sports jackets - and a perhaps, slightly rehearsed, informality. But interestingly quite open to talks that perhaps didn't praise their products or demonstrated questionable use cases, regarding the usual bugbear of data protection. Something that is a real sore spot within Europe apparently - the main blocker to cloud adoption in HE.

Having once attended a Microsoft user group event in Dublin at the end of the 90s, I would say that this was a long way removed from that. The Microsoft event was strictly controlled, no community speakers, nothing but full sales engineer style talks about 'faultless' products, there was no discussion of flaws or even of approaches that could generate technical criticisms. Everybody wore suits - maybe that is just the way software sales were way back when Microsoft dominated the browser and desktop.

Where as now community is where it is at. Obviously GEUG felt slightly less genuinely community after DBBUG, but I would praise Google that they are significantly less controlling over shaping a faultless technical suited business face to HE, than some of their competitors. Unfortunately for them non-technical managers with their hands on the purse strings tend to be largely persuaded by the surface froth of suits and traditional commercial software sales - disguise flaws, rather than allow discussion of whether and how they may be addressed.

In essence the Google persona that is carefully crafted to sit more towards an open source one, but as a result may suffer from the same distrust that traditional non-technical clients have for open source over commercial systems. Having said that they are not doing too badly ... dominating cloud use in US HE. Maybe Europe HE is just a tougher old nut to crack.

Tuesday 5 June 2012

Upgrading a Google App Engine Django app

In my day to day work, I haven't had an opportunity to use Google App Engine (GAE). So to get up to speed with it, and since it offers free hosting for low usage, I created my home site on the platform a year ago. The site uses App Engine as a base, and integrates in Google Apps for any content other than custom content types.

Recently I have been upgrading the Django infrastructure we use at work, from Python 2.6 to 2.7 and Django 1.3 to 1.4. I thought having been a year, it was probably time to upgrade my home site too, having vaguely registered a few changes in App Engine being announced. On tackling the process, I realised that 'a few changes' is an understatement.

Over the last year GAE has moved from a beta service to a full Google service. Its pricing model has changed, its backend storage has changed, the python deployment environment has changed and the means of integrating the Django ORM has changed. A raft of other features have also been added. So what does an upgrade entail?

Lets start with where we were in Spring 2011

Django 1.2 (or 0.96) running on a Python 2.5 single threaded CGI environment
The system stores data in the Master/Slave Big Table datastore
Django's standard ORM django.db is replaced by the NOSQL google.appengine.ext.db *
To retain the majority of forms functionality ext.djangoforms replaces forms *
python-gdata is used as the standard means to integrate with Google Apps via common RESTful Atom based APIs

* as an alternative django-norel could of been used to provide full standard ORM integration - but this was overkill for my needs

To configure GAE a typical app.yaml would of been the following:


application: myapp
version: 1-0
runtime: python
api_version: 1

handlers:
- url: /remote_api
  script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
  login: admin

- url: /.*
  script: main.py

- url: /media
  static_dir: _generated_media
  secure: optional

With the main CGI script to run it


import os

from google.appengine.ext.webapp import util
from google.appengine.dist import use_library

use_library('django', '1.2')
os.environ['DJANGO_SETTINGS_MODULE'] = 'edcrewe.settings'

import django.core.handlers.wsgi
from django.conf import settings

# Force Django to reload its settings.
settings._target = None

def main():
  # Create a Django application for WSGI.
  application = django.core.handlers.wsgi.WSGIHandler()

  # Run the WSGI CGI handler with that application.
  util.run_wsgi_app(application)

if __name__ == '__main__':
  main()

What has changed over the last year

Now we have multi-threaded WSGI Python 2.7 with a number of other changes.

Django 1.3 (or 1.2) running on a Python 2.7 multi threaded WSGI environment
The system stores data in the HRD Big Table datastore
For Big Table the NOSQL google.appengine.ext.db is still available, but Django's standard ORM django.db is soon to be available for hosted MySQL
google.appengine.ext.djangoforms is not available any more
Recommendation is either to stop using ModelForms and hand crank data writing from plain Forms - or use django-norel - but it does have a startup overhead *
python-gdata is still used but it is being replaced by simpler JSON APIs specific to the App in question, managed by the APIs console and accessible via google-api-python.

* django-norel support has moved from its previous authors - with the Django 1.4 rewrite still a work in progress

Hmmm ... thats a lot of changes - hopefully now we are out of beta - there won't be so many in another year's time! So how do we go about migrating our old GAE Django app.

Migration

Firstly the Python 2.7 WSGI environment requires a different app.yaml and main.py

Now to configure GAE a typical app.yaml would be:

application: myapp-hrd
version: 2-0
runtime: python27
api_version: 1
threadsafe: true

libraries:
- name: PIL
  version: latest
- name: django
  version: "1.3"

builtins:
- django_wsgi: on
- remote_api: on

handlers:
# Must use threadsafe: false to use remote_api handler script?
#- url: /remote_api
#  script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
#  login: admin

- url: /.*
  script: main.app

- url: /media
  static_dir: _generated_media
  secure: optional

With the main script to run it just needing...

import os
import django.core.handlers.wsgi

os.environ['DJANGO_SETTINGS_MODULE'] = 'edcrewe.settings'
app = django.core.handlers.wsgi.WSGIHandler()

But why is the app-id now myapp-hrd rather than myapp?
In order to use Python 2.7 you have to move to the HRD data store. To migrate the application from the deprecated Master/Slave data store it must be replaced with a new application. New applications now always uses HRD.
Go to the admin console and 'Application Settings' and at the bottom are the migration tools. These wrap up the creation of a new myapp-hrd which you have to upload / update the code for in the usual manner. Once you have fixed your code to work in the environment (see below) - upload it.
The migration tool's main component is for pushing data from the old to the new datastore and locking writes to manage roll over. So assuming all that goes smoothly you now have a new myapp-hrd with data in ready to go, which you can point your domain at.

NB: Or you can just use the remote_api to load data - so for example to download the original data to your local machine for loading into your dev_server:

${GAE_ROOT}/appcfg.py download_data --application=myapp
--url=http://myapp.appspot.com/remote_api --filename=proddata.sql3

${GAE_ROOT}/appcfg.py upload_data --filename=proddata.sql3 
${MYAPP_ROOT}/myapp --url=http://localhost:8080/remote_api 
--email=foo@bar --passin --application=dev~myapp-hrd

Fixing your code for GAE python27 WSGI

Things are not quite as straight forward as you may think from using the dev server to test your application prior to upload. The dev server's CGI environment no longer replicates the deployed WSGI environment quite so well - like the differences between using Django's dev server and running it via Apache mod_wsgi. For one thing any CGI script imports OK as before for the dev server - yet may not work on upload or require config adjustments - e.g. ext.djangoforms is not there, and use of any of the existing utility scripts - such as the remote_api script for data loading, requires disabling of the multi-threaded performance benefits. Probably the workaround here for more production scale sites than mine, is to have a separate app for utility usage than the one that runs the sites.

If you used ext.djangoforms, either you have to move to django-norel or do code writes directly. For my simple use case I wrote a simple pair of utility functions to do data writes for me, and switched my ModelForms to plain Forms.


def get_ext_db_dicts(instance):                                 
    """ Given an appengine ext.db instance return dictionaries
        for its values and types - to use with django forms
    """                                               
    value_dict = {}                                                
    type_dict = {}                                              
    for key, field in instance.fields().items():                                                 
        try:                                                                      
            value_dict[key] = getattr(instance, key, '')                                                
            type_dict[key] = field.data_type                                                          
        except:                                                                   
            pass                                                        
    return value_dict, type_dict      

def write_attributes(request, instance):
    """ Quick fix replacement of ModelForm set attributes
        TODO: add more and better type conversions
    """
    value_dict, type_dict = get_ext_db_dicts(instance)                                                                                                 
    for field, ftype in type_dict.items():                                                                                                             
        if request.POST.has_key(field):                                                                                                                
            if ftype == type(''):                                                                                                                      
                value = str(request.POST.get(field, ''))                                                                                               
            elif ftype == type(1000):                                                                                                                  
                try:                                       
                    value = int(request.POST.get(field, 0))                             
                except:         
                    value = 0                                                                                                                
            elif ftype == type([]):                              
                value = request.POST.getlist(field)                            
            else:                                                              
                value = str(request.POST.get(field, ''))                                                                                                                                                                   
            setattr(instance, field, value)

Crude but it allows one line form population for editing ...
mycontentform = MyContentForm(value_dict)
... and instance population for saving ...
write_attributes(request, instance)

However even after these fixes and data import, I still had another task. Images uploaded as content fields were not transferred - so these had to be manually redone. This is maybe my fault for not using the blobstore for them - ie since they were small images they are were just saved to the Master/slave data store - but pretty annoying even so.

Apps APIs

Finally there is the issue of gdata APIs being in a state of flux. Well currently the new APIs don't provide sufficient functionality and so given that this API move by Google still seems to be in progress - and how many changes the App Engine migration required - I think I will leave things be for the moment and stick with gdata-python ... maybe in a years time!

Ed Crewe