Ed Crewe: 2011

Friday, 23 December 2011

A potted history of web development paradigms

Recently I had reason to review how we had approached web development for CMS and small web applications at the University where I work. It made me ponder the wider context of the evolution of web development approaches. Having been involved in writing web applications since 1998 and remember using mosaic when the web started, I realise that I am a bit of a historical artifact myself - and always keen to rewrite history ;-)

Server side programming languages
So back to basics, the top seven languages (some way ahead of the many minor languages) in use for server side web development in rough order of current popularity for this sector with their birth year are:

Java 1994
PHP 1994
Python 1989
C# 1999
Perl 1986
Ruby 1993
Visual Basic 1991

That popularity order has shifted around a bit since the inception of the web, and a lot more new languages entered the fray, perhaps most notably Java styled syntax Python/Ruby clones such as Scala and Groovy. C# and Java have replaced use of C/C++, and Ruby has been popular enough to enter the top seven, due to the advent of Ruby on the Rails in 2004.

Client side languages
Client side things are much simpler, Javascript (1995) has always ruled and seen off challenges by Java applets and Microsoft ActiveX then Silverlight over the last decade. As a part of HTML5 via canvas etc. it is designed to replace Flash and other proprietary client UI tools.
It is also increasingly becoming the default language for the restful APIs of modern web / cloud services, where it's JSON data format is supplanting the more heavyweight XML/SOAP (1999) APIs and web services architectures.

The dawn of web development, CGI and Perl
The common gateway interface was defined as a standard for calling programs from a web server in 1993, the same year HTML form elements first appeared, although not standardized until HTML2 in 1995. The position of Perl at the time as the ubiquitous unix scripting language - and in particular its suitability to text parsing, soon lead it to dominate early web applications.
In the early days applications may be little more than a form to email handler or such like. If rendering of HTML by the code was required, it would just be done by constructing a web page as a big string and sending it back.
Web development was more likely to be done by the unix administrator who would also cover content editing in their webmaster role.
The few more complex web applications were likely to be written in C.

The rise of embedded scripting, PHP, CFM, ASP, JSP
PHP began in 1994 rapidly followed by cold fusion 1995, active server pages 1998, then java server pages in 1999. This era saw the beginning of dedicated web developers.
As web applications began to grow a little more sophisticated, the big string approach was proving unwieldy.
Server side includes were already in use for limited embedding of code in web pages. But for applications where significant code was required, it was still often less than the page content they were returning. So rather than create all this HTML within the code, more web specific scripting languages that embed code in HTML arrived, lead by PHP.
As web traffic increased the process per request of CGI gave way to web server modules that could deliver better performance by pooling resources.
Eventually this development paradigm replaced the use of CGI scripting.

Web frameworks are born, cold fusion and zope
The most significant early web frameworks, still surviving (just!) today are Cold fusion, started in 1995, and Zope, which followed in 1996. The early frameworks bundled up standard APIs for handling requests, forms, data storage, email etc. with embedded scripting languages. Zope was the more complex bundling an object NOSQL database, and publishing paradigm based on traversing objects and calling methods on them via the url space.

Use of VM based server runtimes, Enterprise software
Java fully entered the web world in 1997 with the first Servlet implementation, and .NET C# in 2000. They employ similar architectures using languages that precompile to byte code for running on a virtual machine that communicates via a standard API to the web server.
Part of the purpose of these architectures were to allow for the same paradigms, IDEs and development practises to be used for web applications as were already established for desktop development. Hence common elements such as database APIs could be leveraged. This also helped usher in more formal development practises brought from existing OS based application development.

Through the web development and content management systems, TTW & CMS
PHP has produced the most CMS, both generic and more specialist. An early example being TYPO3 in 1997. Today there are thousands of CMS in existence catering for all sorts of niches. Many CMS are either built on top of frameworks, or have ended up verging towards becoming them, as their features become increasingly generic and pluggable. Their approaches and technologies differ widely, however the key common development practice that tends to distinguish them is that since they cater for through the web content editing, they tend to add code and configuration editing TTW too. This is fine for quick prototyping but has proved unviable for long term code maintenance and scaling. Hence the more mature / large scale CMS have evolved features that allow for all code to be pushed out to the file system, along with configuration (usually as XML) for versioning and release management. There is now also a cross platform standard for content and authorisation data transfer CMIS. These various elements allow separation of concerns and delivery which may usher in a new era. Various CMS for different purposes (intranet, blog, portal, wiki, VLE, CRM, specialist etc.) as content repositories locally or in the cloud, with loose coupling of front end delivery and integration web sites separate from the content creation and administration software.

Templating languages
As soon as web applications began to become more complex the problems of embedding raw code in HTML, ie the early PHP, ASP, JSP approach began to reveal itself. Mixing content and code so tightly together doesn't scale to larger applications, so templating languages were invented to help separate the presentation from the logic layer.
Clear silver was one of the first templating languages dating from 1999, originally targeting larger web apps written in C/C++. This was followed by Apache Velocity in 2001 and zope page templates, TAL in 2002. Since then templating engines have proliferated, and it is now unusual to develop a web application that delivers markup - whether HTML, XML, SVG etc. without one. Hence raw embedded scripting has largely been superceded.
In general there are two schools of thought with templating engines.
The first's primary aim being for templates to allow full markup validation prior to code rendering. This approach is closer to embedded scripting since it caters for snippets of the server side language.
The second school aims for templating languages to be as simple as possible and doesn't allow embedded code, limiting the capabilities of template tags to ensure clean separation of logic from the template presentation layer. This approach doesnt usually provide pre-rendering validation.
Examples of the former are TAL, Genshi, Facelets. Whilst the latter include Django templates, Velocity, Closure, Smarty.

MVC web frameworks replace embedded scripting
Although the model view controller pattern dates from the early days of GUI at Xerox with Smalltalk in the 70s, it took a surprisingly long time to reach its position of dominance in web development. Early 90s web frameworks such as the ones mentioned above didn't use it, but from their introduction in 1999 Java's JSP pages posited a model2 (MVC) approach rather than the embedded (model1) default. The first successful MVC framework was Apache Struts (2000) implemented using JSPs as the view, and servlets as the controller. These and other early Java MVC frameworks utilized the pattern to an extent but without a full stack implementation. Hence no ORM for full object abstraction of data storage as models. Although tag libraries could reduce embedded Java, JSP usage as the view rather than a template library, lead to poor logic separation.
It was Ruby on the Rails in 2004 that really proved the effectiveness of a full stack MVC framework. It was followed by the other 4 leading web MVC frameworks in 2005, django, cake, symfony and codeigniter. Perhaps surprisingly there is no full stack purely MVC Java framework. JSF is missing ORM features and Spring is a heavier weight framework (not soley for the web) but there are optional MVC layers that can be run on it, Spring MVC and now Grails - although here development is with Groovy not Java. Finally Microsoft launched ASP.NET-MVC in 2007.
The features and productivity that a full stack framework can provide are significant, usually at least double that of developing a fully bespoke solution. So increasingly purely bespoke development is only for specialist areas and large commercial applications. As a result it has dwindled as a proportion of web development, especially wrt. embedded scripting, and largely been replaced either by adapting generic applications, such as CMS, or using MVC frameworks.

Some near futures for web development?

As web development becomes the predominant software medium then more formal development practises will hopefully become the norm. The use of continuous integration, testing and release management - possibly via continuous deployment methodologies, is likely to become much more widespread. Driven by its match with the feed back loop of usability and user centered design.
HTML5 and AJAX based rich client user interface design its currently in its infancy, but it looks likely to replace most other client approaches, reversing the current short trend to return to platform specific development with mobile - Android, Objective C, etc.
Cloud services are likely to continue their rise, so development that uses restful APIs to create integrated mash up solutions are likely to become more standard. These may gradually replace the current use of local generic applications, such as CMS, with cheap or free cloud hosted alternatives.
Full stack frameworks and other solutions that help increase productivity are likely to continue their rise - with more complete end to end features, ie data modeller (ORM) to client (HTML5) libraries included. Possibly more of these will be used in a software as a service manner, along the lines of Google Apps or Microsoft Azure.
The rise of automation and the cloud will result in the concept of small, medium enterprise scale (SMB/SME) fading - along with local installation / licensing. Replaced by web scale platform and software as a service - charged by the scale and uptime requirements.

Sunday, 7 August 2011

Dashing off an egg

Last month I sacrificed a weekend to python coding, having got a 2 day pass from the kids. Taking part in a django dash. The local DBUG community organised three teams to take part at the House of Omni, in Bristol. As it turned out the dash had to be delayed (results due next week), but unable to rearrange, the Bristol teams just went ahead
with our own dash anyway.

The idea behind the dash is to put together a working and hopefully useful django site within a strict 48 hours time period - competing to be judged the best against other teams of 2-3 developers / designers. Same as PyWeek but shorter and it doesn't have to be a game! Hence the competitive, and non-framework focussed elements distinguish a dash from a sprint.

I formed a team with Tom Dunham and we worked on a project Tom had found posted by Fraser Stephens of the Helios Foundation, to create a demonstrator for the sharing of supply chain data between aid agencies. The idea was to use a web based database that would allow a range of organisations to upload their inventory details along with location data to cater for the easy pooling and exchange of resources on the ground - hence aiding rapid targeted distribution.

Although there is an existing HELIOS supply chain system, this one was aimed at being a light weight alternative for groups that may not have HELIOS or other full inventory databases in place. So instead they could use the lowest common denominator of exporting databases or spreadsheets to CSV files, for manual or automatic scheduled synchronisation.

For ease of working we decided to split the system into two eggs (one each), the project specific one that sets up the django site and models, etc. and a generic CSV import egg.
We just about managed to deliver a prototype, which is on the HELIOS site.

Luckily since we weren't bound by the rules of the real competition, so I could make sure it was up and running after the 48 hour deadline. Perhaps more importantly since I worked on the CSV import egg, I could add some documentation and release it as django-csvimport.
If you do use it, let me know of issues, so I can refine it.

Its just a starting point, but in time it hopefully should result in less django developers wasting time writing custom CSV import scripts. It also meant that I became more familiar with character encoding in python and some of the useful relevant libraries in this area, ie the core csv library, chardet and xlrd.

I am ashamed to admit that I didn't know until I needed to write some unicode parsing tests that in order to be able to use readable non-ASCII characters in python code (rather than escape sequences), you just add the appropriate encoding line to the top of your file

# -*- coding: utf-8 -*-

... I guess its an ignorance born of being a linguistically challenged native English speaker :-}

Sunday, 12 June 2011

KISS Djangocon.eu

Last week I had a great time in Amsterdam at Djangocon.eu, and many thanks to everybody involved in the event, especially the organisers.
Many useful talks and it was energizing to see an event and platform growing as Django is doing, with lots of young new developers coming on board.
I gave a short talk myself, about how I am using Django at the University of Bristol as an ideal lightweight agile framework for glueing together components that have traditionally been labelled as 'Enterprise' architecture. Traditionally SOAP web services and 'Business' objects, are sold as the necessary integration components. Complex environments and problems requiring complex (commercially named) technologies as solutions. The gist of my talk was that just because it is a complex problem it doesn't need a complex solution. Integration technology should be as simple and decoupled as possible to be robust.
This is really just the old engineering principle, KISS (Keep It Simple Stupid). Examples in IT trends are the decline of SOAP in favour of restful approaches, the rise of the cloud - and social media over more workflowed institutional CMS. NoSQL replacing large relational databases. Simple component services perhaps tied together just with javascript and JSON, rather than complex integration APIs and XML, tightly coupled to relational schemas.
Applying the same KISS ideals to the world of python, then it appears similar lessons are being learned. From a recent blog posting, by Mikko Ohtamaa, Plone is attempting to shed complexity, whilst as Martijn Faassen alluded to in his talk - zope found that a full rewrite to add java style interface adaptor patterns via XML, to zope's already overly full basket of patterns, was a step too far.
To finish Russell Keith-Magee gave a Roadmap talk, appealing to developers to get on board, and for Django to not suffer from a clique mentality. The approach of tell us where you want Django to go, and if you help out, it will. In response, I sincerely hope that Django remembers that KISS is its mission, and makes it a point of principle to not increase its core size. If anything components such as the admin should probably be taken out to develop separately - along with any other more CMS specific components. Storage too, could be pluggable as relational or no-rel, etc. with just SQLite by default.

PS: For those that attended, appologies over my talk. My netbook proved to have been rather too bashed around by the kids, and the graphics card couldn't cope with the multi-head projectors. Hence I had to borrow one from the audience, so an already rather ambitiously squeezed talk had to be skipped through in even less time. I hope the result (and diagrams) were not too impenetrable, maybe next time I can deliver something a bit clearer!

Sunday, 29 May 2011

Using App Engine and Google Apps as a CMS

I recently submitted a conference talk, and the form asked for the usual home page url. I tend to use my blog, but I also have stuff scattered around in other places and it made me realize it was probably a good idea to stick this stuff together somewhere. Given that I am not Ryan Giggs it is unlikely that this site would attract a vast amount of public interest, so what I wanted was a fairly simple CMS, that was free for low traffic, and could pull in content and documents from other sites or systems.

For the standard page, image and file CMS functionality then there are now a number of free Wikis / simple CMS, such as Wikia or Google Sites. These tend to have sufficient features, usually no workflow, but often content versioning, which is a bonus. So why not just use one of these out of the box, well where's the fun in that? ... but more seriously they don't cater for full design freedom. They often have fixed templates, they also tend to have very limited, if any, custom content types.

I needed a project type for populating with a portfolio of past work, with image fields and categorisation lists. I also want to pull in some of my documents, and drawing and painting is a hobby of mine, so I fancied adding some of my efforts via a photo sharing site.

Finally there was this blog, well currently its in Wordpress hosted at my workplace, but I may not be there forever, so I thought it was time to migrate to an external service - and have the advantage of some extra tools and integration capabilities as well.

Although happy to do some development, given that this site was a spare time task, I didn't want to spend days on integration code. The leading choice that combines all these tools with a standard well documented API is Google Apps - I plumped for that. But the same principles can be applied to any mainstream hosted content service, since they should all have an API if they are any good. For example there is a python library for Flickr.

So to start mashing up all this Google-ness, I created a site in App Engine's adapted Django which holds the aggregation code for the site, and caters for full customisation and complex content types, such as the portfolio projects.

With inspiration from Django's flatpages app I added a middleware handler to check 404's against requests to the site, any content not found in App Engine falls back to Google Sites and retrieves the page from there via the gdata API. So in effect you have an admin interface for editing which is the bare untemplated Google Site, and the public interface for display via App Engine.
Google Sites has the advantage of already being integrated via common authorisation with Google docs, Picasa, YouTube etc. so dropping in slides shows of photos, or Office documents is trivial.

The gdata API is ATOM based and extensive with a full python library wrapper. It caters for read or writes of content, ACL, revisions, templates etc. for the various component Apps. Alongside it is the OAuth protocol acting as the single sign on system to bind together the authentication and authorisation. It raises the potential of a developing much larger scale CMS by developing tools to automate site creation and integration from Google App domains of thousands of users.

But back to my home site. For my limited requirements the job was relatively trivial, so I would recommend the Google approach for simple low traffic CMS. But before this becomes a pure advert for Google, I did come across a few random gripes along the way.

The default gdata API wrapper for sites doesnt return all the content via entity.content.html - you have to pass in a custom handler function.
Upgrading from Ubuntu 10 to 11 broke the GAE dev server, due to this issue!
GAE's python is 2.5 (until later this year) and its django uses custom data types and other customisations - hence django-norel for unadulterated Django on App Engine.
A custom list widget was needed to handle BigTable's list type.
Image uploads from a HTC Desire to Picasa break. The broken images also caused picnik to break until chrome's cache was fully flushed.
Distinguishing public links and edit links in docs has bad usability - eg. any share link is not the view link.
Quotas are getting lower soon for free AppEngine - but via gdata you can circumvent this with the much larger Apps ones - e.g. send mails via the gdata GMail API, not the GAE mail API.
The wake up time for the site can occasionally be pretty slow, ie. 5 seconds or more, to bring up a dynamic instance if traffic is so low, it regularly shuts down. Of course you can pay 20p a day for always on.
Neither Google sites or Blogger's WYSIWYG editors produce clean minimal html - instead they seem to churn out loads of divs filled with embedded styles - that may validate but they are bloated and a designer's nightmare.

Any how the site is now up, http://www.edcrewe.com, although it could do with some more work on content and design, when I can squeeze it in. At least all my stuff is in one place now :-)

Sunday, 27 February 2011

A website's content journey

I had an email from an old friend a few weeks ago. He had a site with a bunch of work related research pages and documents collaboratively edited by a handful of people, and had let its domain lapse. It was a 6 year old plone site that he and his fellow editors suddenly wanted back up, for another 5 years, but had no cash to do so. It also needed to move hoster. The first thing to do was re-register the domain, which costs around 40 dollars for a cut price option - now I needed the site back to point it to. Initially I assumed a static dump was bound to be the quickest and cheapest option.

I had a look at some static site python tools such as hyde, this is a django based mirroring tool inspired by Ruby's Jekyll. Where simple database apps such as blogs can be dumped to static files for performance, whilst still being editable. However my friend was not a techie so was unlikely to be able to cope with file system editing of text 'content' files. So for the time being I just ran httrack over the site to dump it to the file system. Next I copied it over to a 'free' amazon micro instance. Since this was a static site, using Apache also seemed overkill, and I thought it was long overdue that I tried out nginx.
However there proved to be almost nothing to try out, since the default Amazon AMI comes with an nginx package. All you need do is add a new micro instance, start it up, run
>sudo yum install nginx
and copy the contents of a httrack dump of the site to /usr/share/nginx/html
Thats it. It was very fast and the config was very simple. A big thumbs up for nginx then, and I also quite like its Russian constructivist styled website, especially now the original Russian only documentation has many translations ;-) The final stage was to assign an Amazon elastic ip to the instance and point the domain registration at that ip.

Great the site was back and seemed pretty nippy, however two problems - it was no longer a CMS, and secondly Amazon's free micro instances are actually only available as a month of uptime hours free trial. After that the hosting was a lot more expensive than a standard minimal shared hoster, and neither option were free. So if hosting was to be paid for I might as well do a proper job and upgrade the plone site to current plone 4, so making it a CMS again.

Fortunately I released a tool that does just that, called ilrt.contentmigrator, a year or so ago.
It takes content from old plone (eg 2.0) and exports it to a simple email style format of content and metadata, that can be reimported to a new plone site. Only problem was I hadnt updated all the tests and bits and pieces to release a plone 4 version yet. But since 4 had been out for some months, it was high time that I did, and this was the excuse I needed. I got the upgrade done and exported the site, where it ran happily in plone 4.
So now I had a working CMS back up on the old domain, and could run it up on an Amazon micro as a fully featured CMS again. So email my friend - its back you can edit it, only problem - its going to cost about 20 dollars a month.

Ahhh now of course I should of recalled that one of my friends defining characteristics was being a tight wad - the idea of paying hundreds of dollars over the next 5 years meant the site was effectively down again! So back to the drawing board. Ok so with all these free services / cloud technologies out there these days, there must be a cheaper solution. A quick hunt around and the answer was obvious, the CMS had no sophisticated features, so a free Google site would easily cover my friends requirements whilst not requiring the dropping of the site-like strucure and collaborative document nature that a simpler blog, such as a Wordpress solution might do.

So set up a Google site, now put the content in. Well of course I could just tell my friend to cut and paste it all, but Google has a pretty extensive data and provisioning API. I had already written a content migrator for Plone. Why not make it work between Plone and Google sites API as well. So using the python wrapper for the restful Atom feed based Google data APIs, I added an export tool that writes the basic content types and folders from Plone to a Google site.
The two share in common the storage paradigm of a NOSQL database and a folder like interface to content creation. Plone has an inherent folder like storage paradigm at an internal level implemented via its acquisition mechanism within the ZODB, whilst Google sites have a much thinner skin of folder like behaviour added by parent child node properties to the objects stored in its BigTable hash table cloud (the shared storage behind site, apps, app engine etc.)
As it turned out this meant that the writing of a migration tool to push the more metadata rich content from Plone to Google was quite straight forward. I rewrote the import to Plone script as an import to Google one, using the gdata library. So the site was up as a Google site. Change that domain ip again, and my friend had his site back, for free, hurray job done.

However I couldn't quite leave it there. I had written a tool to move simple Plone sites to Google for free hosting. But there was probably at least as big a use case for moving more limited design, content types and workflow Google sites to Plone, when those sites customisation demands have outgrown their Google site origins. On top of that I should really write some tests and package things up properly to add these features to my new ilrt.contentmigrator release.
As it turned out migrating from Google site to Plone was a little harder, partly because the Google sites restful Atom API, doesn't expose the folder tree layer
of content by default. So all content is available from the root feed, but it misses empty folders out. Also there seemed to be a bug with getting the folder's (or filecabinets as Google sites calls them) summary text. I guess the API is still in Labs status so this may be fixed in time.

Anyhow I have released it as a first version for the standard folder, pages and file attachment types. So I hope somebody has reason to use the tool's new features, and can give me any feedback when they do.

Monday, 3 January 2011

Cloud snakes

I have been using Python technologies relevant to the cloud for the last few years. But I am not a freelancer so have had less reason to get my hands dirty with it, however I have tried out some cloud deployments with Google App Engine and Amazon. Before I went any deeper I thought it makes sense to do a little research on the current state of the various cloud providers - and related Python APIs and technology. This post summarizes my take on this.

What clouds?

By their nature clouds are woolly and ill defined, so what do I mean by the cloud. I mean a large distributed tcp/ip network of auto-configured virtual servers and related storage resources where usage is metered (and can be charged - rather than charging for the hardware or software directly).

Essentially the cloud can be split up into two clouds, software as a service cloud provision (SaaS), and infrastructure as a service (IaaS) provision. SaaS is also somewhat hazy since it isnt specific to the cloud and could really include any software that allows user input and is delivered via distributed farms of servers and storage, from Wordpress and Facebook to more obvious examples such as Google Docs, Salesforce and BaseCamp.

On the basis that SaaS systems and tools are too numerous and diverse to delve into, I am leaving them out of this discussion. However anything that provides a cloud infrastructure below the application level is in the remit, so this includes the derived form of SaaS, ie. platform as a service, PaaS.

Platform as a Service - PaaS

The two biggest PaaS contenders are Google App Engine and Microsoft Azure. Although it could be reasonably argued that Amazon EC2 makes the creation of any custom Platform as a Service sufficiently simple to occupy the cross over territory between PaaS and IaaS. Whilst Rackspace may be second only to Amazon in the league of IaaS cloud providers, it is moving to cloud hosting from a more traditional managed hosting background so its PaaS is perhaps not yet strictly of the cloud variety.

Microsoft's Azure is the smallest cloud of these four companies, and although there are examples of iron python azure usage, has the least python relevance. So suffice to say that if you want your .NET apps in the cloud then Azure may be easier for you than using Amazons .NET APIs and Windows options.

Google App Engine originated as a Python WSGI only hosted platform, which now also has a Java option. This runs on the cloud that supports GMail and Google Docs which also have a Python or Java API (v3), that GAE hosted apps can integrate with.
The ease of use and initial free quota charging model of GAE make it well suited to smaller load or trialling applications and pure developer users. Whilst the automated multi-instances, memcaching and BigTable architecture can also make scaling simple and an economic choice for infrequent peak traffic sites. Higher load sites may be more economically sited with Amazon or Rackspace.
All the associated automation tools are Python based, and the Python version of the platform itself comes packaged with Django. So setting up a Django site takes very little effort, with the local dev server adapted to simulate Google's cloud features, such as BigTable. A single YAML file is used for configuration and a one liner seemlessly brings up a new code instance to replace the old.
(My trial of GAE lead to a site called placeUvote and a presentation explaining the basics of the GAE PaaS).

An example of a less mainstream form of PaaS is for Grid computing style usage, PiCloud, for running high processor load computations and tasks. There are also numerous third party suppliers of systems for specific web platforms that do the work of plugging into Amazon, Rackspace etc. for you. Hence provide platform as a service. On top of that there are a variety of cloud deployment wrappers released for particular frameworks or CMS - so Drupal has the Python based Amazon targeted Pantheon for instance. These give you a foot up in doing the task yourself.

Infrastructure as a Service - IaaS

Why Python for cloud tools? Well essentially replace the term cloud with 'large distributed farms of auto-configured infrastructure and servers'. Then ask which language is currently most actively used for software configuration management, package handling, deployment shell scripting, etc. - Python followed closely by Ruby. So most of the leading cloud vendors either build their automation in Python (Google, Rackspace) or Ruby (Amazon), with ubiquitous Java common for user client libraries.

Both languages also sport shell frameworks and configuration management systems, with the leading Python ones being Fabric and BCfg2 (intros to Fabric and Bcfg2). These types of tools provide the pull and push mechanisms that deploy the cloud. Whilst others provide the instrumentation and associated API to allow for the customer interface for external usage and metering of it. Hence to make more than small scale usage of cloud services, integrate them with locally hosted systems and develop an efficient and robust architecture with them organisations need to move to a similar approach, with more development using these same Python or Ruby tools required for sys-admin roles, hence automation built for internal servers can also be used for config management of Amazon Machine Instances (AMIs) or other vendors IaaS virtual servers (see OpenStack below).

The tools you choose depend on the scale of the task - to integrate a large organisations internal servers with a cloud provider would require a complex configuration management set up with the configuration engine being able to drive server and software deployment across both domains. Whilst to deploy one or two servers to deliver a particular client solution may take no more than some well crafted shell framework based scripts.

For the latter there are a number of simple cloud specific tools that use Pythons remote ssh library paramiko (as does Fabric). An example of this is silver lining. A few more of the cloud libraries and API wrappers in Python are listed below...

IaaS Clouds

If you want to avoid vendor tie in or abstract your code from vendor APIs then use of an independent library for cloud configuration is a good starting point, the leading option here is libcloud.

Rackspace has python-cloudservers and python-cloudfiles and is shortly due to release its first API bindings library in python.

Whilst Amazon is laden with tools and API wrappers for any of the mainstream languages, alongside Ruby and Python. A couple of months ago it started a free trial service offering so like Google, users can dip their toe in the cloud for no cost. So a good starting point is to sign up and pick a micro (free) AMI then try it out by following the getting started guide. There are AMIs available from application's sites, or Amazons own AMI repository searched based on Python, one of the largest external AMI repos is the cloud market. But initially beware that these may not necessarily function correctly with respect to SSH access etc. so best to use a standard Amazon one. However for production use you will want to create your own and move to
automation for deployment, for Python the core library for this is boto. For AWS there is also a Restful query API, or SOAP based interface and finally standard SSH based on keys. Login as ec2-user and then use sudo to do root tasks (it uses the key by default).
The free micro instances may not suit production but they do provide a very useful resource for testing software across different platforms, or generally trialling server setups.

The Future?

A relatively recent project is OpenStack its aim to develop a new open common set of standards for cloud providers, with a number of big players involved, lead by Rackspace and Nasa. The system is essentially the open sourcing of the Rackspace's internal cloud system code and hence is all Python based with the initial cloud 'operating system' and associated storage API - the first components now available for download.

It remains to be seen whether this approach by Rackspace will challenge Amazon or Google to do something similar. However the appeal for organisations of being able to run internal virtual server farms and external clouds via the same automation code base is obvious - perhaps once this integrated approach to internal and external infrastructure becomes more common place, the adoption of OpenStack or other open source cloud automation frameworks, will gather pace.

Ed Crewe

Ed Crewe Home

Ed Crewe home | software | projects | art