Monday 3 January 2011

Cloud snakes

I have been using Python technologies relevant to the cloud for the last few years. But I am not a freelancer so have had less reason to get my hands dirty with it, however I have tried out some cloud deployments with Google App Engine and Amazon. Before I went any deeper I thought it makes sense to do a little research on the current state of the various cloud providers - and related Python APIs and technology. This post summarizes my take on this.

What clouds?

By their nature clouds are woolly and ill defined, so what do I mean by the cloud. I mean a large distributed tcp/ip network of auto-configured virtual servers and related storage resources where usage is metered (and can be charged - rather than charging for the hardware or software directly).

Essentially the cloud can be split up into two clouds, software as a service cloud provision (SaaS), and infrastructure as a service (IaaS) provision. SaaS is also somewhat hazy since it isnt specific to the cloud and could really include any software that allows user input and is delivered via distributed farms of servers and storage, from Wordpress and Facebook to more obvious examples such as Google Docs, Salesforce and BaseCamp.

On the basis that SaaS systems and tools are too numerous and diverse to delve into, I am leaving them out of this discussion. However anything that provides a cloud infrastructure below the application level is in the remit, so this includes the derived form of SaaS, ie. platform as a service, PaaS.

Platform as a Service - PaaS

The two biggest PaaS contenders are Google App Engine and Microsoft Azure. Although it could be reasonably argued that Amazon EC2 makes the creation of any custom Platform as a Service sufficiently simple to occupy the cross over territory between PaaS and IaaS. Whilst Rackspace may be second only to Amazon in the league of IaaS cloud providers, it is moving to cloud hosting from a more traditional managed hosting background so its PaaS is perhaps not yet strictly of the cloud variety.

Microsoft's Azure is the smallest cloud of these four companies, and although there are examples of iron python azure usage, has the least python relevance. So suffice to say that if you want your .NET apps in the cloud then Azure may be easier for you than using Amazons .NET APIs and Windows options.

Google App Engine originated as a Python WSGI only hosted platform, which now also has a Java option. This runs on the cloud that supports GMail and Google Docs which also have a Python or Java API (v3), that GAE hosted apps can integrate with.
The ease of use and initial free quota charging model of GAE make it well suited to smaller load or trialling applications and pure developer users. Whilst the automated multi-instances, memcaching and BigTable architecture can also make scaling simple and an economic choice for infrequent peak traffic sites. Higher load sites may be more economically sited with Amazon or Rackspace.
All the associated automation tools are Python based, and the Python version of the platform itself comes packaged with Django. So setting up a Django site takes very little effort, with the local dev server adapted to simulate Google's cloud features, such as BigTable. A single YAML file is used for configuration and a one liner seemlessly brings up a new code instance to replace the old.
(My trial of GAE lead to a site called placeUvote and a presentation explaining the basics of the GAE PaaS).

An example of a less mainstream form of PaaS is for Grid computing style usage, PiCloud, for running high processor load computations and tasks. There are also numerous third party suppliers of systems for specific web platforms that do the work of plugging into Amazon, Rackspace etc. for you. Hence provide platform as a service. On top of that there are a variety of cloud deployment wrappers released for particular frameworks or CMS - so Drupal has the Python based Amazon targeted Pantheon for instance. These give you a foot up in doing the task yourself.

Infrastructure as a Service - IaaS

Why Python for cloud tools? Well essentially replace the term cloud with 'large distributed farms of auto-configured infrastructure and servers'. Then ask which language is currently most actively used for software configuration management, package handling, deployment shell scripting, etc. - Python followed closely by Ruby. So most of the leading cloud vendors either build their automation in Python (Google, Rackspace) or Ruby (Amazon), with ubiquitous Java common for user client libraries.

Both languages also sport shell frameworks and configuration management systems, with the leading Python ones being Fabric and BCfg2 (intros to Fabric and Bcfg2). These types of tools provide the pull and push mechanisms that deploy the cloud. Whilst others provide the instrumentation and associated API to allow for the customer interface for external usage and metering of it. Hence to make more than small scale usage of cloud services, integrate them with locally hosted systems and develop an efficient and robust architecture with them organisations need to move to a similar approach, with more development using these same Python or Ruby tools required for sys-admin roles, hence automation built for internal servers can also be used for config management of Amazon Machine Instances (AMIs) or other vendors IaaS virtual servers (see OpenStack below).

The tools you choose depend on the scale of the task - to integrate a large organisations internal servers with a cloud provider would require a complex configuration management set up with the configuration engine being able to drive server and software deployment across both domains. Whilst to deploy one or two servers to deliver a particular client solution may take no more than some well crafted shell framework based scripts.

For the latter there are a number of simple cloud specific tools that use Pythons remote ssh library paramiko (as does Fabric). An example of this is silver lining. A few more of the cloud libraries and API wrappers in Python are listed below...

IaaS Clouds

If you want to avoid vendor tie in or abstract your code from vendor APIs then use of an independent library for cloud configuration is a good starting point, the leading option here is libcloud.

Rackspace has python-cloudservers and python-cloudfiles and is shortly due to release its first API bindings library in python.

Whilst Amazon is laden with tools and API wrappers for any of the mainstream languages, alongside Ruby and Python. A couple of months ago it started a free trial service offering so like Google, users can dip their toe in the cloud for no cost. So a good starting point is to sign up and pick a micro (free) AMI then try it out by following the getting started guide. There are AMIs available from application's sites, or Amazons own AMI repository searched based on Python, one of the largest external AMI repos is the cloud market. But initially beware that these may not necessarily function correctly with respect to SSH access etc. so best to use a standard Amazon one. However for production use you will want to create your own and move to
automation for deployment, for Python the core library for this is boto. For AWS there is also a Restful query API, or SOAP based interface and finally standard SSH based on keys. Login as ec2-user and then use sudo to do root tasks (it uses the key by default).
The free micro instances may not suit production but they do provide a very useful resource for testing software across different platforms, or generally trialling server setups.

The Future?

A relatively recent project is OpenStack its aim to develop a new open common set of standards for cloud providers, with a number of big players involved, lead by Rackspace and Nasa. The system is essentially the open sourcing of the Rackspace's internal cloud system code and hence is all Python based with the initial cloud 'operating system' and associated storage API - the first components now available for download.

It remains to be seen whether this approach by Rackspace will challenge Amazon or Google to do something similar. However the appeal for organisations of being able to run internal virtual server farms and external clouds via the same automation code base is obvious - perhaps once this integrated approach to internal and external infrastructure becomes more common place, the adoption of OpenStack or other open source cloud automation frameworks, will gather pace.