Tuesday 5 June 2012

Upgrading a Google App Engine Django app

In my day to day work, I haven't had an opportunity to use Google App Engine (GAE). So to get up to speed with it, and since it offers free hosting for low usage, I created my home site on the platform a year ago. The site uses App Engine as a base, and integrates in Google Apps for any content other than custom content types.

Recently I have been upgrading the Django infrastructure we use at work, from Python 2.6 to 2.7 and Django 1.3 to 1.4. I thought having been a year, it was probably time to upgrade my home site too, having vaguely registered a few changes in App Engine being announced. On tackling the process, I realised that 'a few changes' is an understatement.

Over the last year GAE has moved from a beta service to a full Google service. Its pricing model has changed, its backend storage has changed, the python deployment environment has changed and the means of integrating the Django ORM has changed. A raft of other features have also been added. So what does an upgrade entail?

Lets start with where we were in Spring 2011

  1. Django 1.2 (or 0.96) running on a Python 2.5 single threaded CGI environment
  2. The system stores data in the Master/Slave Big Table datastore
  3. Django's standard ORM django.db is replaced by the NOSQL google.appengine.ext.db *
  4. To retain the majority of forms functionality ext.djangoforms replaces forms *
  5. python-gdata is used as the standard means to integrate with Google Apps via common RESTful Atom based APIs

    * as an alternative django-norel could of been used to provide full standard ORM integration - but this was overkill for my needs
To configure GAE a typical app.yaml would of been the following:

application: myapp
version: 1-0
runtime: python
api_version: 1

handlers:
- url: /remote_api
  script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
  login: admin

- url: /.*
  script: main.py

- url: /media
  static_dir: _generated_media
  secure: optional 
With the main CGI script to run it

import os

from google.appengine.ext.webapp import util
from google.appengine.dist import use_library

use_library('django', '1.2')
os.environ['DJANGO_SETTINGS_MODULE'] = 'edcrewe.settings'

import django.core.handlers.wsgi
from django.conf import settings

# Force Django to reload its settings.
settings._target = None

def main():
  # Create a Django application for WSGI.
  application = django.core.handlers.wsgi.WSGIHandler()

  # Run the WSGI CGI handler with that application.
  util.run_wsgi_app(application)

if __name__ == '__main__':
  main()

What has changed over the last year

Now we have multi-threaded WSGI Python 2.7 with a number of other changes.

  1. Django 1.3 (or 1.2) running on a Python 2.7 multi threaded WSGI environment
  2. The system stores data in the HRD Big Table datastore
  3. For Big Table the NOSQL google.appengine.ext.db is still available, but Django's standard ORM django.db is soon to be available for hosted MySQL
  4. google.appengine.ext.djangoforms is not available any more
    Recommendation is either to stop using ModelForms and hand crank data writing from plain Forms - or use django-norel - but it does have a startup overhead *
  5. python-gdata is still used but it is being replaced by simpler JSON APIs specific to the App in question, managed by the APIs console and accessible via google-api-python.

    * django-norel support has moved from its previous authors - with the Django 1.4 rewrite still a work in progress
Hmmm ... thats a lot of changes - hopefully now we are out of beta - there won't be so many in another year's time! So how do we go about migrating our old GAE Django app.

Migration

Firstly the Python 2.7 WSGI environment requires a different app.yaml and main.py 
Now to configure GAE a typical app.yaml would be:
application: myapp-hrd
version: 2-0
runtime: python27
api_version: 1
threadsafe: true

libraries:
- name: PIL
  version: latest
- name: django
  version: "1.3"

builtins:
- django_wsgi: on
- remote_api: on

handlers:
# Must use threadsafe: false to use remote_api handler script?
#- url: /remote_api
#  script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
#  login: admin

- url: /.*
  script: main.app

- url: /media
  static_dir: _generated_media
  secure: optional 
With the main script to run it just needing...
import os
import django.core.handlers.wsgi

os.environ['DJANGO_SETTINGS_MODULE'] = 'edcrewe.settings'
app = django.core.handlers.wsgi.WSGIHandler()
But why is the app-id now myapp-hrd rather than myapp?
In order to use Python 2.7 you have to move to the HRD data store. To migrate the application from the deprecated Master/Slave data store it must be replaced with a new application. New applications now always uses HRD.
Go to the admin console and 'Application Settings' and at the bottom are the migration tools. These wrap up the creation of a new myapp-hrd which you have to upload / update the code for in the usual manner. Once you have fixed your code to work in the environment (see below) - upload it.
The migration tool's main component is for pushing data from the old to the new datastore and locking writes to manage roll over. So assuming all that goes smoothly you now have a new myapp-hrd with data in ready to go, which you can point your domain at.

NB: Or you can just use the remote_api to load data - so for example to download the original data to your local machine for loading into your dev_server:
${GAE_ROOT}/appcfg.py download_data --application=myapp
--url=http://myapp.appspot.com/remote_api --filename=proddata.sql3

${GAE_ROOT}/appcfg.py upload_data --filename=proddata.sql3 
${MYAPP_ROOT}/myapp --url=http://localhost:8080/remote_api 
--email=foo@bar --passin --application=dev~myapp-hrd

Fixing your code for GAE python27 WSGI

Things are not quite as straight forward as you may think from using the dev server to test your application prior to upload. The dev server's CGI environment no longer replicates the deployed WSGI environment quite so well - like the differences between using Django's dev server and running it via Apache mod_wsgi. For one thing any CGI script imports OK as before for the dev server - yet may not work on upload or require config adjustments - e.g. ext.djangoforms is not there, and use of any of the existing utility scripts - such as the remote_api script for data loading, requires disabling of the multi-threaded performance benefits. Probably the workaround here for more production scale sites than mine, is to have a separate app for utility usage than the one that runs the sites.

If you used ext.djangoforms, either you have to move to django-norel or do code writes directly. For my simple use case I wrote a simple pair of utility functions to do data writes for me, and switched my ModelForms to plain Forms.


def get_ext_db_dicts(instance):                                 
    """ Given an appengine ext.db instance return dictionaries
        for its values and types - to use with django forms
    """                                               
    value_dict = {}                                                
    type_dict = {}                                              
    for key, field in instance.fields().items():                                                 
        try:                                                                      
            value_dict[key] = getattr(instance, key, '')                                                
            type_dict[key] = field.data_type                                                          
        except:                                                                   
            pass                                                        
    return value_dict, type_dict      

def write_attributes(request, instance):
    """ Quick fix replacement of ModelForm set attributes
        TODO: add more and better type conversions
    """
    value_dict, type_dict = get_ext_db_dicts(instance)                                                                                                 
    for field, ftype in type_dict.items():                                                                                                             
        if request.POST.has_key(field):                                                                                                                
            if ftype == type(''):                                                                                                                      
                value = str(request.POST.get(field, ''))                                                                                               
            elif ftype == type(1000):                                                                                                                  
                try:                                       
                    value = int(request.POST.get(field, 0))                             
                except:         
                    value = 0                                                                                                                
            elif ftype == type([]):                              
                value = request.POST.getlist(field)                            
            else:                                                              
                value = str(request.POST.get(field, ''))                                                                                                                                                                   
            setattr(instance, field, value)

Crude but it allows one line form population for editing ...
mycontentform = MyContentForm(value_dict)
... and instance population for saving ...
write_attributes(request, instance)
However even after these fixes and data import, I still had another task. Images uploaded as content fields were not transferred - so these had to be manually redone. This is maybe my fault for not using the blobstore for them - ie since they were small images they are were just saved to the Master/slave data store - but pretty annoying even so.

Apps APIs

Finally there is the issue of gdata APIs being in a state of flux. Well currently the new APIs don't provide sufficient functionality and so given that this API move by Google still seems to be in progress - and how many changes the App Engine migration required - I think I will leave things be for the moment and stick with gdata-python ... maybe in a years time!

No comments:

Post a Comment