Ed Crewe

Talk about Cloud Prices at PyConLT 2025

2025-04-07T15:14:00.010+01:00

Introduction to Cloud Pricing

I am looking forward to speaking at PyConLT 2025.
My talk is called Cutting the Price of Scraping Cloud Costs (video)

Its been a while (12 years!) since my last Python conference EuroPython Florence 2012, when I spoke as a Django web developer, although I did give a Golang talk at Kubecon USA last year.

I work at EDB, the Postgres company, on our Postgres AI product. The cloud version of which runs across the main cloud providers, AWS, Azure and GCP.

The team I am in handles the identity management and billing components of the product. So whilst I am mainly a Golang micro-service developer, I have dipped my toe into Data Science, having rewritten our Cloud prices ETL using Python & Airflow. The subject of my talk in Lithuania.

Cloud pricing can be surprisingly complex ... and the price lists are not small.

The full price lists for the 3 CSPs together are almost 5 million prices - known as SKUs (Stock Keeping Unit prices)

csp x service x type x tier x region
3 x 200 x 50 x 3 x 50 = 4.5 million

csp = AWS, Azure and GCP

service = vms, k8s, network, load balancer, storage etc.

type = e.g. storage - general purpose E2, N1 ... accelerated A1, A2 multiplied by various property sizes

tier = T-shirt size tiers of usage, ie more use = cheaper rate - small, medium, large

region = us-east-1, us-west-2, af-south-1, etc.

We need to gather all the latest service SKU that our Postgres AI may use and total them up as a cost estimate for when customers are selecting the various options for creating or adding to their installation.
Applying the additional pricing for our product and any private offer discounts for it, as part of this process.

Therefore we needed to build a data pipeline to gather the SKUs and keep them current.

Previously we used a 3rd party kubecost based provider's data, however our usage was not sufficient to justify for paying for this particular cloud service when its free usage expired.

Hence we needed to rewrite our cloud pricing data pipeline. This pipeline is in Apache Airflow but it could equally be in Dagster or any other data pipeline framework.

My talk deals with the wider points around cloud pricing, refactoring a data pipeline and pipeline framework options. But here I want to provide more detail on the data pipeline's Python code, its use of Embedded Postgres and Click, and the benefits for development and testing. Some things I didn't have room for in the talk.

Outline of our use of Data Pipelines

Airflow, Dagster, etc. provide many tools for pipeline development.
Notably local development mode for running up the pipeline framework locally and doing test runs.
Including some reloading on edit, it can still be a long process, running up a pipeline and then executing the full set of steps, known as a directed acyclic graph, DAG.

One way to improve the DEVX is if the DAG step's code is encapsulated as much as possible per step.
Removing use of shared state where that is viable and allowing individual steps to be separately tested, rapidly, with fixture data. With fast stand up and tear down, of temporary embedded storage.

To avoid shared state persistence across the whole pipeline we use extract transform load (ETL) within each step, rather than across the whole pipeline. This enables functional running and testing of individual steps outside the pipeline.

The Scraper Class

We need a standard scraper class to fetch the cloud prices from each CSP so use an abstract base class.

from abc import ABC

class BaseScraper(ABC):
   """Abstract base class for Scrapers"""
   batch = 500
   conn = None
   unit_map = {"FAIL": ""}
   root_url = ""

   def map_units(self, entry, key):
       """To standardize naming of units between CSPs"""
       return self.unit_map.get(entry.get(key, "FAIL"), entry[key])

   def scrape_sku(self):
       """Scrapes prices from CSP bulk JSON API - uses CSP specific methods"""
       Pass

   def bulk_insert_rows(self, rows):
       """Bulk insert batches of rows - Note that Psycopg >= 3.1 uses pipeline mode"""

       query = """INSERT INTO api_price.infra_price VALUES
       (%(sku_id)s, %(cloud_provider)s, %(region)s, … %(sku_name)s, %(end_usage_amount)s)"""

       with self.conn.cursor() as cur:
           cur.executemany(query, rows)

This has 3 common methods:
mapping units to common ones across all CSP
Top level scrape sku methods some CSP differences within sub methods called from it
Bulk insert rows - the main concrete method used by all scrapers
To bulk insert 500 rows per query we use Psycopg 3 pipeline mode - so it can send batch updates again and again without waiting for response.
The database update against local embedded Postgres is faster than the time to scrape the remote web site SKUs.

The largest part of the Extract is done at this point. Rather than loading all 5 million SKU as we did with the kubecost data dump, to query out the 120 thousand for our product. Scraping the sources directly we only need to ingest those 120k SKU. Which saves handling 97.6% of the data!

So the resultant speed is sufficient although not as performant as pg_dump loading which uses COPY.

Unfortunately Python Psycopg is significantly slower when using cursor.copy and it mitigated against using zipped up Postgres dumps. Hence all the data artefact creation and loading simply uses the pg_dump utility wrapped as a Python shell command.
There is no need to use Python here when there is the tried and tested C based pg_dump utility for it that ensures compatibility outside our pipeline. Later version pg_dump can always handle earlier Postgres dumps.

We don't need to retain a long history of artefacts, since it is public data and never needs to be reverted.
This allows us a low retention level, cleaning out most of the old dumps on creation of a new one. So any storage saving on compression is negligible.
Therefore we avoid pg_dump compression, since it can be significantly slower, especially if the data already contains compressed blobs. Plain SQL COPY also allows for data inspection if required - eg grep for a SKU, when debugging why a price may be missing.

Postgres Embedded wrapped with Go

Unlike MySQL, Postgres doesn't do in memory databases. The equivalent for temporary or test run database lifetime, is the embedded version of Postgres. Run from an auto-created temp folder of files.
Python doesn’t have maintained wrapper for Embedded Postgres, sadly project https://github.com/Simulmedia/pyembedpg is abandoned 😢

Hence use the most up to date wrapper from Go. Running the Go binary via a Python shell command.
It still lags behind by a version of Postgres, so its on Postgres 16 rather than latest 17.
But for the purposes of embedded use that is irrelevant.

By using separate temporary Postgres per step we can save a dumped SQL artefact at the end of a step and need no data dependency between steps, meaning individual step retry in parallel, just works.
The performance of localhost dump to socket is also superior.
By processing everything in the same (if embedded) version of our final target database as the Cloud Price, Go micro-service, we remove any SQL compatibility issues and ensure full Postgresql functionality is available.

The final data artefacts will be loaded to a Postgres cluster price schema micro-service running on CloudNativePG

Use a Click wrapper with Tests

The click package provides all the functionality for our pipeline..

> pscraper -h

Usage: pscraper [OPTIONS] COMMAND [ARGS]...

price-scraper: python web scraping of CSP prices for api-price

Options:

-h, --help Show this message and exit.

Commands:

awsscrape Scrape prices from AWS

azurescrape Scrape prices from Azure

delold Delete old blob storage files, default all over 12 weeks old are deleted

gcpscrape Scrape prices from GCP - set env GCP_BILLING_KEY

pgdump Dump postgres file and upload to cloud storage - set env STORAGE_KEY
> pscraper pgdump --port 5377 --file price.sql

pgembed Run up local embeddedPG on a random port for tests

> pscraper pgembed

pgload Load schema to local embedded postgres for testing

> pscraper pgload --port 5377 --file price.sql

This caters for developing the step code entirely outside the pipeline for development and debug.
We can run pgembed to create a local db, pgload to add the price schema. Then run individual scrapes from a pipenv pip install -e version of the the price scraper package.

For unit testing we can create a mock response object for the data scrapers that returns different fixture payloads based on the query and monkeypatch it in. This allows us to functionally test the whole scrape and data artefact creation ETL cycle as unit functional tests.

Any issues with source data changes can be replicated via a fixture for regression tests.

class MockResponse:
"""Fake to return fixture value of requests.get() for testing scrape parsing"""

name = "Mock User"
payload = {}
content = ""
status_code = 200
url = "http://mock_url"

def init(self, payload={}, url="http://mock_url"):
self.url = url
self.payload = payload
self.content = str(payload)

def json(self):
return self.payload

def mock_aws_get(url, **kwargs):
"""Return the fixture JSON that matches the URL used"""
for key, fix in fixtures.items():
if key in url:
return MockResponse(payload=fix, url=url)
return MockResponse()

class TestAWSScrape(TestCase):
"""Tests for the 'pscraper awsscrape' command"""

def setUpClass():
"""Simple monkeypatch in mock handlers for all tests in the class"""
psycopg.connect = MockConn
requests.get = mock_aws_get
# confirm that requests is patched hence returns short fixture of JSON from the AWS URLs
result = requests.get("{}/AmazonS3/current/index.json".format(ROOT))
assert len(result.json().keys()) > 5 and len(result.content) < 2000

A simple DAG with Soda Data validation

The click commands for each DAG are imported at the top, one for the scrape and one for postgres embedded, the DAG just becomes a wrapper to run them, adding Soda data validation of the scraped data ...
def scrape_azure():
   """Scrape Azure via API public json web pages"""
   from price_scraper.commands import azurescrape, pgembed
   folder, port = setup_pg_db(PORT)
   error = azurescrape.run_azure_scrape(port, HOST)
   if not error:
       error = csp_dump(port, "azure")
   if error:
       pgembed.teardown_pg_embed(folder)
       notify_slack("azure", error)
       raise AirflowFailException(error)

   data_test = SodaScanOperator(
       dag=dag,
       task_id="data_test",
       data_sources=[
           {
               "data_source_name": "embedpg",
               "soda_config_path": "price-scraper/soda/configuration_azure.yml",
           }
       ],
       soda_cl_path="price-scraper/soda/price_azure_checks.yml",
   )
   data_test.execute(dict())
   pgembed.teardown_pg_embed(folder)

We setup a new Embedded Postgres (takes a few seconds) and then scrape directly to it.

We then use the SodaScanOperator to check the data we have scraped, if there is no error we dump to blob storage otherwise notify Slack with the error and raise it ending the DAG

Our Soda tests check that the number of and prices are in the ranges that they should be for each service. We also check we have the amount of tiered rates that we expect. We expect over 10 starting usage rates and over 3000 specific tiered prices.

If the Soda tests pass, we dump to cloud storage and teardown temporary Postgres. A final step aggregates together each steps data. We save the money and maintenance of running a persistent database cluster in the cloud for our pipeline.

What is JSON transcoding?

2024-09-22T15:42:00.020+01:00

A tool that enables use of gRPC as a single API for microservices and REST web frontend, by automated translation of your gRPC API into JSON RESTful services.

What is gRPC?

gRPC is the name of a language agnostic data transfer framework designed for cloud microservices implemented via HTTP2, created by Google, around the same time they released Kubernetes.
It is to some extent equivalent to the REST standard that was developed over HTTP1.1 and used to transfer data for web browsers. But REST uses plain text JSON, where as gRPC encodes messages as binary protobuf format data.

There are similarities between gRPC development and REST development, hence similar tools.
For scripted API interactions REST has Postman which also has some gRPC support, but there is also Kreya, grpcUI etc.

Why use gRPC rather than REST?

gRPC is 10 x faster than REST and best suited to cross microservice remote procedure calls.

gRPC uses protoc to generate any or all of Go, Python, C#, C++, Java, Node.js, PHP, Ruby, Dart, Kotlin & Rust code out of the box - so allowing your microservice engineers to use native Go, your SREs Python and other integrators their language of choice. Each language can import the autogenerated API libraries for it, which are generated from the *.proto source files that developers create to define the API.

You have a single versioned API defined in one place for all services. This may be a dedicated myservices-proto package with all your source/*.proto files in and a script to run protoc and generate the libraries for each language your company uses. Along with the master API definition file eg. descriptor/proto.pb or proto.pbtext for the bigger human readable version.

The gRPC protocol is strongly typed and allows full validation of data in and out. It is a binary format so more compact. JSON via REST does not allow such control of the data typing and validation. It was designed as a simple serialisation format for dynamically typed HTML page scripting language, Javascript objects. Not as a performant backend RPC protocol for cloud microservices.

Because of the much looser standards around REST and JSON many people adopt the Swagger framework to help standardize and document their REST API. Whilst gRPC has formal standards in its core protocols.

Why use JSON transcoding?

Why still use REST too?

The first question that occurs is why run a REST API at all? The reason is that whilst it uses large, slow and minimally typed messaging. It is the default standard approach for front end web development. Since web front ends are implemented in Javascript it is natural to build them against a RESTful backend that provides data in the native JSON format.

Added to this, even the latest web browsers have incomplete support for the HTTP2 protocol required by gRPC. This in turn leads to poor support in the Javascript eco-system. Also for devx accessibility, gRPC messages are not immediately human readable.

Perhaps most importantly gRPC was never designed to replace REST. It was designed for fast backend cloud deployed internal service composition.
REST is a web protocol designed for loose coupling services from across the web. Where the loose standards, typing, simplicity and transparency of REST / JSON complement HTML5. Given attempts to impose stricter typing, such as XHMTL were a failure, using gRPC as a front end replacement for REST is asking for trouble. Instead standardizing REST via Swagger, OpenAPI and the like are a more practical approach.

The front-end web world loves REST. But gRPC is far superior for more tightly coupled backend microservices. Given that, JSON transcoding is likely to remain a very useful means of saving on API proliferation and complexity by providing a single API for both, via your edge servers (ie the servers between your cloud deployed services and the internet).

How does JSON transcoding work?

JSON transcoding suits a gRPC centric single master API design. The ideal approach when designing new cloud services built from micro-services.

It is implemented by using the transcoder plugin which can run in existing proxy servers such as Envoy Proxy The plugin can use gRPC proto source files to autogenerate a JSON REST API without any code generation required.

Alternatively there is grpc-gateway which generates the implementation from the proto files and requires a compilation step. This different implementation under the hood is not strictly transcoding. But in effect it does the same job.

Or for Microsoft language world, there is the JSON transcoder for ASP.net

In all cases you create the REST API based on the google.api.http annotations standard in your *.proto files (add the green text below) to a simple example service ...

syntax = "proto3";
package your.service.v1;
option go_package = "github.com/yourorg/yourprotos/gen/go/your/service/v1";

import "google/api/annotations.proto";

 message StringMessage {
   string value = 1;
 }

service YourService {
   rpc Echo(StringMessage) returns (StringMessage) {
      option (google.api.http) = {
          post: "/v1/example/echo"
          body: "*"
      };
   }
}

The google.api.http set of annotations are used to define how the gRPC method can be called via REST.

Why use it?

Use a transcoder and you only need to maintain your gRPC and your public REST API in one place - the gRPC proto files - with simple one liner annotations

You no longer need to develop any REST server code. Running up web servers that provide a REST API and have to be manually coded to keep it in synch with the backend gRPC API can be dispensed with. Or if you are running direct REST interfaces from your Golang microservices these can be dropped as less type safe and more error prone. Replacing them with gRPC microservices and replacing data validation code layers in different backend code languages, with proto level data validation in one place. Validation can be your own custom validator code or you can use a full plugin such as buf.build or protoc-gen-validator

Now as you build your gRPC API you also build a JSON RESTful one too. Adding custom data validators at the gRPC level, defining your full API in one place.

The gRPC API annotations give you a performant REST API that is auto generated and run via Envoy Proxy’s fast C++ based edge server - for use by the JSON front end. Automatically transcoding messages back and forth from REST request / responses to gRPC ones.

What about Transcoding other HTTP content types that are not JSON?

It may be called a JSON transcoder but the transcoder can also transcode other content types.

To transcode other http content types, you must use the google.api.HttpBody type. Put the content in the body and set (and call the UI) with the appropriate content-type header. For example for a gRPC CSV file download, eg. getting log files ...

syntax = "proto3";

package your.service.v1;

option go_package = "github.com/yourorg/yourprotos/gen/go/your/service/v1";

import "google/api/annotations.proto";
import "google/api/httpbody.proto";

 message StringMessage {

   string value = 1;

}

service YourService {

  rpc GetCSVFile(StringMessage) returns (google.api.HttpBody) {
    option (google.api.http) = {
        get : "/v1/example/echo.csv"
    }; 
  }

}

A Go implementation of the method to return the CSV file might be ...

import (
    "embed"
    "google.golang.org/genproto/googleapis/api/httpbody"

    api_v1 "github.com/my-proto-pkg/generated/go/public/v1"
)

//go:embed *.csv
var EmbedFS embed.FS

// GetCSVFile to return a CSV file via gRPC as HttpBody
func (p *Provider) GetCsvFile(ctx context.Context, req api_v1.StringMessage) (*httpbody.HttpBody, error) {
    csvData, err := EmbedFS.ReadFile(req.Value)
    if err != nil {
        return nil, err
    }
    return &http.HttpBody{
        ContentType: "text/csv",
        Data:        csvData,
    }, nil
}

A call to /v1/example/echo.csv?value=smalldata.csv with content-type=text/csv (or application/json) should return that file.

Other content types such as PDF can be similarly handled.

Data streaming large content types

When returning content other than compact gRPC message formats another issue arises.
What if bigdata.csv is 2 Gb in size? gRPC upload limits are 4 Mb and although download is unlimited it is best to stream anything that may be over that size.

For large messages response streaming content needs to be used.

It is very simple to switch the protocol for gRPC and the transcoded Http REST request and / or responses. If either is prefixed with the word stream then that streaming handling is implemented both in gRPC and for the transcoded REST API. So to stream 2 Gb files from REST change the proto definition as shown in bold red ...

rpc StreamCSVFile(StringMessage) returns (stream google.api.HttpBody) {

Although this is not the only thing to be done. The main work is that the method implementation needs to stream the data too.

// StreamCsvFile to stream large CSV files via HttpBody
func (p *Provider) StreamCsvFile(req *emptypb.Empty, responseStream api_v1.MyProto_StreamCsvFileServer) error {
    f, err := os.Open("/tmp/bigdata.csv")
    if err != nil {
        return nil
    }
    defer f.Close()

    r := bufio.NewReader(f)
    buf := make([]byte, 4*1024*1024) // Use 4 MB buffer

    for {
        n, err := r.Read(buf)
        if n > 0 {
            resp := &gapi.HttpBody{
                ContentType: "text/csv",
                Data:        buf[:n],
            }
            if err := responseStream.Send(resp); err != nil {
                return nil
            }
        }
        if err == io.EOF {
            break
        }
        if err != nil {
            return nil
        }
    }
    return nil
}

When the file is fetched it returns it to the browser as streamed Http - so in chunks.
So if you do this via Envoy and have it on debug mode you can see it being served by a series of responses in chunks via streaming http, in response to a REST get.

1. curl --header 'Content-Type: application/json' http://localhost:8080/v1/example/echo.csv?value=bigdata.csv

2. transcoder translates that request to gRPC http://grpc_host:80/myproto.api.v1.StreamCsvFile

3. The microservice returns a stream of gRPC responses with the file chunked up into them to Envoy

4. Envoy starts serving those chunks as http responses to the curl web client

5. When all the responses are done the task is complete and curl has downloaded the full file and stuck it back together.

When tested out locally on my Mac it downloaded files at over 100Mb/sec so transcoding does not appear to introduce any performance hit.

How to try out Envoy with the transcoder filter

Envoy can be downloaded and installed locally to try out JSON transcoding. You just need to be able to run up the gRPC service for it to talk to and provide the proto.pb API definition to it via the Envoy config.yaml

Envoy is easily installed on Linux or Mac

Once it is installed run up your gRPC service or port-foward it from k8s or kind.

Copy its proto.pb to a local protos/helloworld.pb directory for envoy to access it. Then run up envoy...

envoy --service-node ingress --service-cluster ingress -c envoy-config.yaml --log-level debug

A sample config for running a service is detailed in the Envoy transcoder help page
Note that the REST path should not be added to the config with the current version 3 transcoder.
The transcoder reads the paths for REST from the proto.pb
It just needs the gRPC dot notation based URL /helloword.Greeter that indicates the gRPC service

15    - filters:
16      - name: envoy.filters.network.http_connection_manager
17        typed_config:
18          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
19          stat_prefix: grpc_json
20          codec_type: AUTO
21          route_config:
22            name: local_route
23            virtual_hosts:
24            - name: local_service
25              domains: ["*"]
26              routes:
27              - match:
28                  prefix: /helloworld.Greeter
29                route:
30                  cluster: grpc_myproto

When port forwarding from a k8s running service something along the following lines could be used to point Envoy at it.

clusters: - name: grpc_myproto type: STATIC connect_timeout: 5s http2_protocol_options: {} load_assignment: cluster_name: grpc_myproto endpoints: - lb_endpoints: - endpoint: address: socket_address: address: 127.0.0.1 port_value: 9090

Note that the sample config from the envoy transcoder help page is missing some useful config that you may want in production, eg. convert_grpc_status: true, so that if gRPC errors occur, they are transcoded and returned in the body of the http response. By default (== false), the body is blank and the gRPC error is only in the header. There is a full list of config options available.

Testing

If you want to write tests against REST in Go (rather than just testing the gRPC) you will need the Go JSON annotated structs to test with, that marshal JSON back and forth.

Since the REST API is transcoded on the fly by Envoy then these don't exist in any of your source code. To generate them you need to use command line translation tools.

gRPC proto.pb > buf.build > OpenAPIv2 at this point you have a JSON open API spec which can be used to create your REST tests in Javascript or any language with OpenAPI v2 support.

To do so in Go we can continue the pipeline with OpenAPIv2 > OAPI-codegen > generated_v1.go

A test using httpexpect (to clean up the raw syntax of http) would be something like ...

resp := generated_v1.GetCsvResponse{}

e := httpexpect.Default(t, restSvcURL)

e.GET("/v1/example/echo.csv").Expect().Status(http.StatusOK).JSON().Decode(&resp)

assert.Equal(t, resp.Data, sampleCSV)

(this assumes our example GetCsvResponse embeds the google.api.HttpBody as Data and includes other fields)

Software Development with Generative AI - 2024 Update

2024-06-09T13:21:00.012+01:00

Why write an update?

I wrote a blog post on Software Development with Generative AI last year, which was questioning the approach of the current AI software authoring assistants. I believe the bigger picture holds true that to fully utilize AI to write software, will require an entirely different approach. Changing the job of a software developer in a far more radical manner and perhaps making many of today's software languages redundant.

However I also raised the issue that I found the current generative AI helpers utility questionable for seasoned developers:

"The generative AI can help students and others who are learning to code in a computer language, but can it actually improve productivity for real, full time, developers who are fluent in that language?
I think that question is currently debatable... (but it is improving rapidly) ... We may reach that point within a year or two"

Well it hasn't been a year or two, just 6 months. But I believe the addition of the Chat window to CoPilot and an improvement in the accuracy of its models has already made a significant difference.

On balance I would now say that even a fluent programmer may get some benefits from its use. Given the speed of improvement it is likely that all commercial programming will use an AI assistant within a few years.

To delay the inevitable and not embed it in to your work process is like King Canute commanding the sea to retreat. There are increasing numbers of alternatives available too. However as the market leader I believe it is worth going in to slightly more depth as to the current state of play with CoPilot.

Github Copilot Features

The new Chat window within your IDE gives you a context sensitive version of Copilot ChatGPT that can act as a pair programmer and code reviewer for your work.

If you have enabled auto-complete then you instigate that usage by writing functional comments, ie prompts then tabbing out to accept the suggestions it responds with.

To override these prompts, you instead can use dot and get real code completion options (as long as your IDE is configured correctly). Since code completion has your whole codebase as context, it complements CoPilot reasonably well. But whilst the code completion is always correct, CoPilot is less so, probably more like 75% now compared to its initial release level of 50%

It takes some time to improve the quality of your prompting. An effort must be made to eradicate any nuance, assumption, implication or subtlety from your English. Precise mechanical instructions are what are required. However its language model will have learnt common usage. So if you ask it to sort out your variables it will understand that you mean replace all hardcoded values in the body of your code with a set of constants defined at the top, explain that is what it thinks you mean and give you the code that does that.

You can ask it anything about the usage of the language you are working in, how something should be coded, alternatives to that etc. So taking a pair programming approach and explaining what you are about to code and why to CoPilot chat as you go, can be very useful. Given rubber duck programming is useful, having an intelligent duck that can answer back ... is clearly more so.

It excels as a learning tool, largely replacing Googling and Stack Overflow with an IDE embedded search for learning new languages. But even for a language you know well, there can be details and nuances of usage you have overlooked or changes in syntactic standards with new releases you have missed.

You can also ask it to give your file a code review. Where it will list out a series of suggested refactors that it judges would improve it.

Copilot Limitations

Currently however there are many limitations, understanding them, helps you know how to use CoPilot and not turn it off in frustration at its failings!

The most important one is that CoPilot's context is extremely limited. There is no RAG enhancement yet, no learning from your usage. It may seem to improve with usage, but that is just you getting better at using it. It does not learn about you and your coding style as you might expect, given a dumb shopping site does that as standard.

It does not create a user context for you and populate it with your codebase. It simply grabs the content of the currently edited file and the Chat prompt text and the language version for the session as a big query. The same for the auto-suggestion. But here the chat text is from the comments or doc strings on the lines preceding.

Posting the lot to a fixed CoPilot LLM that is some months out of date. Although apparently it has weekly updates from continuous retraining.

This total lack of context can mean the only way you can get CoPilot to suggest what you actually want is to write very detailed prompts. It is often simpler to just cut and paste example code as comments into the file - please rewrite blah like this ... paste example. Since only if its in the file or latest Chat question will it get posted to inform the response.

At the time of writing CoPilot is due to at least retain and learn from Chat window history to extend its context a little. But currently it only knows about the currently open file and latest Chat message. Other providers have tools that do load the whole code base, for example Cody, plus there are open source tools to post more of your code base to ChatGPT or to an open source LLM.

As this blog post update indicates, the whole area is evolving at an extremely rapid pace.

The model it has for a language is fixed and dated. Less so for the core language but for example you may use a newer version of the leading 3rd party Postgres library that came out 2 years ago. But the majority of users are still on the previous one since it is still maintained. Their syntax differs. Copilot may only know the syntax for the old library because that is what it was trained with, even though a later version is being imported in the file, so is in Copilot's limited context. So any chat window or code prompts it suggests will be wrong.

I have yet to find it brings up anything useful that I didn't know about the code when using the code review feature, plus the suggestions can include things that are inapplicable or already applied. But I am sure it would be more useful for learning a new language.

AI prompting and commenting issue

Good practise for software teams around code commenting are that you should NOT stick in functional comments that just explain what the next few lines do. The team are developers and they can read the code as quickly for its base functionality. Adding lots of functional commenting makes things unclear by excessive verbosity.
It is something that is only done for teaching people how to code in example snippets. It has no place in production code.

Comments should be added to give wider context, caveats, assumptions etc. So commenting is all about explaining the Why, not the How.

Doc strings at the head of methods and packages can contain a summary of what the function does in terms of the codebase. So more functional in orientation, but as a big scale summary. So again they are a What not a How.

It looks like current AI assistants may mess that up. Since they need comments that are basically as close to pseudo code as possible. Adding information about real world issues, roadmap, wider codebase, integration with other services ... ie all the Why is likely to confuse them and degrade the auto-complete.

Unfortunately code comments are not AI prompts for generating code and vice versa.
Which suggests that you may want to write a temporary prompt as a comment to generate the code, then replace it with a proper comment once it has served its purpose.

Or otherwise introduce a separate form of hideable prompt marked comment that make it clear what is for the AI and what is for the Human!

Alternatively use the chat window for code generation then paste it in.

Copilot Translation

Translation is an area where Copilot can be very beneficial. As a non-native English speaker you can interact with it in your own language for prompting and comments and it will handle that and translate any comments in the file to English if asked to.

Code translation is more problematic, since the whole structure of a program and common libraries can be different. But if the code is doing some very encapsulated common process. For example just maths operations, or file operations. It can extract the comments and prompts and regenerate the code into another language for you.

One can imagine that one day the only language anyone will need will be a very high level, succinct English-like language, eg. Python.
When you want to write in a verbose or low-level language. You just write the simpler prompts in a spoken language, but use Python when it is faster to communicate explicitly than spoken. Since spoken languages are so unsuited to creating machine instructions.
Press a button and Copilot turns the lot into verbose C or Java code with English comments.

Software Engineering Hiring and Firing

2024-04-27T10:41:00.028+01:00

The jump in interest rates to the highest level in over 20 years that hit in summer 2023 for the US, UK and many other countries is still impacting the Software industry. Rates may be due to drop soon, but currently it has choked off investment, upped borrowing costs and lead to many software companies making engineers redundant to please the markets.

For the UK the estimate is around 8% of software industry jobs made redundant. Although strangely, the overall trend in vacancies for software engineers continues to march upwards, the initial surge after the pandemic dipped last summer but has now recovered.
But if you work in the industry you are bound to have colleagues and friends who have been made redundant, if you are lucky enough to have not been impacted personally.

Given recent history, I thought it may be worth reflecting on my personal experience of the whole hiring and firing process, in the tech industry. It is a UK centric view, but the companies I have worked for in the last 8 years are US software companies.

I have been fired, hired and conducted technical interviews to hire others. Giving me a few different perspectives.

This post is NOT about getting your first Software job

I first got a coding job in the public sector and it was as a self taught web developer in the 1990s, before web development was a thing you could get a degree in. So I initially got a job in IT support, volunteered to act up (ie no pay increase) and built some websites that were needed, then became a full time web developer through a portfolio of work, ie sites.

Today junior developers may have to prove themselves suitable by artificial measures. I skipped these, so I do not have any professional certifications, or any to recommend. I also don't know how to ace coding algorithm or personality profile assessments.

Once you are 5-10 years in to a software career - none of those approaches are used for hiring decisions.

Only large companies are likely to subject you to them, and that is really out of fairness on the juniors who have to go through them, and to screen out dodgy applicants. Screening just needs to be passed, it will not have any input into whether you get the job. Hence Acing the coding interview as promoted by sites such as LeetCode is not even a thing, only passing coding exercise systems in order to start, or switch to, a career as a developer. I would recommend starting an open source project instead, to demonstrate you can actually code.

The majority of small to medium software companies and of job vacancies require experience and in effect have no vacancies for the most junior software grades with less than 3 years under their belt. So they tend not to use any of these filtering methods. They just want to see proof that you are already a developer, and usually base that on face to face interviews and examples of your code you provide them. So much like how I was originally hired back in the 1990s.

I have only been subject to a LeetCode style test once, which was for a generic job application, ie hiring for numbers of SREs of various seniority, for a FANNG.

F I R E D

When you get that unexpected one to one Zoom call with you manager appear in your calendar these days, it is unlikely to be great news 😓

In the majority of cases the firing process, or to be more polite, redundancy, is all about balancing the finances of the whole company or institution. As such it is very unlikely to be about you.

Of course people are also fired as individuals for various reasons, one of which is actually not being any good at their job, failing to get along with their manager, being a bad culture fit, jobs turning out not to be what was advertised, or expressing political views. Since unlike where I once worked, in the UK Education sector, where 50% of staff are union members, US software companies will have less than 1% membership, so don't tend to respond well to dissent.

Mostly this happens via failing probation, at around 15% then maybe another 5% annually for disciplinary / performance improvement failure.

If you want to try getting individually fired then go the overemployed route. Get two or three jobs at once and test how long it takes before the company notices you giving 110% is now only 40% and fire you. The rule of thumb is the larger the company, the longer it takes!

But this post's focus isn't about individual firing, its about organizational hiring and firing.

Firing Reasons

A company may be doing badly in a slow long term way, so it has to chop as part of a restructure and downsize to attempt to fix that.
Alternatively the company could be doing really well. So it gets the attention of a big investment company and is bought up and merged with its rival. To fix overlap and justify the merger - both companies lose 20% of staff.
Maybe it needs to pivot towards a new area (currently likely to be AI) and so chop 20% of its staff so it can hire 15% experienced, and pricey, AI developers.
Or it may just have had a one off external impacting event that hit it financially. So to balance the earnings for that year and keep its share price good, it chops a bunch of staff. It will rehire next year, when it suits the balance sheet. This is the example in which I was made redundant along with 5% of staff, it was a big company, so that was a few thousand people globally.
Finally it may be an industry wide phenomenon as it is with the current redundancies in the software industry. A world clamp down on easy cheap loans means investment company driven industries such as tech. are no longer awash with spare cash. Cut backs look good right now, and keep the share price high.
Hence redundancies that are nothing to do with the industry itself or its future prospects.

That is mirrored in who is fired. Companies do not keep a log book of gold stars and black marks against each employee. They do not use organizational triggered rounds of redundancies to select individuals to fire. They certainly have not got the capability to accurately determine all the best employees and only fire the worst ones. You will be fired based on what part of the organization you are in, how much it is valued in the current strategy and how much you cost vs others who could do your job. If you are currently between teams / or in a new team or role which has yet to establish itself, when the music stops - like musical chairs, bad luck you are out.

The only personal element may be if a whole team is seen as under performing or difficult to manage it might be axed. No matter that it contains a star performer. Decisions may also be geographic. Lets axe the Greek office, save by withdrawing engineering from a country, which is again how I was made redundant, the rest of my team was in Greece.
Alternatively it may be, fire under 20 staff from each country, to avoid more burdensome regulation for bulk layoffs.

The organization could create an insecure / downturn atmosphere to encourage staff to leave. Because its a lot cheaper for people to leave than the company paying out redundancy settlements.

Redundancy keeps the average employees 😐

As a result in response to significant redundancies an organisation will tend to lose more of the best employees - since they are the most able to move, the most likely to get a big pay rise if they move and the least likely to want to stick around if they see negative organisational change. The software industry has very high staff turnover at almost 20%. Out weighing any nominal idea of removing less efficient staff.

If a company handles things well it may only lose a representative productivity range of staff from best to worst. But a bulk redundancy process is likely to lead to the biggest loss in the top talent, get rid of slightly more of the bottom dwellers and so result in maximising the mediocre!

In summary the answer to 'Why me?' in group redundancies is "because you were there" ... and you didn't have a personal friendship with the CEO 😉. Of course that is why new CEOs are often brought in to restructure - the first step of which is to take an axe to the current C-suite.

Some of the best software engineers I have worked with have been made redundant at some point in their career. Group redundancies are not about you or how well you do your job. But taking it personally and challenging the messenger, with why me?, as demonstrated by recent viral videos, is an understandable emotional response to rejection, and the misguided belief that work aims to be some form of meritocracy, in the same way college might.

LIFO and FIFO

LIFO rather than FIFO is the norm in Firing. New hires are less likely to have established themselves as essential to the company, and have less personal connections within it. More importantly many countries redundancy legislation doesn't kick in until over 2 years of employment and the longer you have been employed the more the company will have to pay to terminate you.

Which means a new hire who has uprooted for their new tech job, will be the most likely to find themselves losing that job when bulk redundancies hit.
But FIFO has its place, next would be older engineers. Some companies don't even hire hands on engineers much over the age of 40, anyway. But staff near retirement have at most only a few years left to contribute and may cost more for the same grade. So encouraging early retirement can be part of the bulk redundancy process.

Prejudicial Firing

Whilst redundancy is all about costs and not about your personal performance. That is not to say companies who pass the redundancy choices down to junior managers may not end up with firing disproportionate numbers of workers who are not from the same background as their manager, ie white USA males, ideally younger than the manager. But prejudice is not personal either. That is pretty much what defines it as prejudice, a pre-judgement of people based on physical characteristics rather than their ability at the job. Also people are least likely to fire staff that they have the most in common with, resulting in prejudicial firing.
Unfortunately it seems many companies with a good diversity policy for hiring, may not have adequate ones for firing. Again resulting in losing more of the higher performing staff.

I have heard of a case where someone got a new manager, who on joining was told to cut from his team, so he fired everyone outside the USA. The worker was so keen to stay at their current employer they went over the head of their manager to senior management and asked for their redundancy to be repealed. Since they had been at the company many years and personally knew senior management, this worked.
Alternatively a more purely cost based restructure may hire all developers from cheaper countries and fire most of them in the US. As happened with Google's Python team recently.

Fight for your job?

The company may set up a pool process for bulk redundancies if numbers are high enough per country, where you can fight for a place on the lifeboat of remaining positions.

In both cases I would recommend that you don't waste time on a company that doesn't value you. If you do stay you risk, dealing with the bulk redundancy aftermath. Which will be present unless the redundancies were for a pivot (3) or one off event (4).
An increased workload, pay freezes, no bonus, needing to over work to justify being kept on, plus a negative work atmosphere.

In a case where I stayed after the department I was in was axed, I had to reapply for a new job which was moved to a different division. The work was less worthwhile and at the time, the employment of in-house software developers as a whole, was questioned as being unnecessary for the organisation. I outstayed my welcome for 18 months of legacy commercial software support, before getting the message and quitting.

Lesson learnt, if you must ask to to stay in your company, via senior management, a pool or reapplication. Make sure you look around and apply for other jobs outside of it at the same time.

You also miss out on a minimum of a couple of months tax free pay as a settlement.

On the other hand, if the redundancy round is for a more minor pivot, and you are happy in the role, it may be well worth staying around to see how things pan out.

Of course you may get no choice in the matter, in which case, get straight into GET HIRED mode, and start the job search. If you can manage it fast enough, you will benefit financially from the whole process. Although if the reason is (5) a sector wide reduction, then it will be take longer and be harder to obtain the usual 20% pay increase that a new position can offer.

H I R E D

Why change jobs (aside from being fired!)

It is a lot easier to get a pay rise or promotion by changing companies, than being promoted internally. To fast track your career to a principal or architect top IC role. Or just get a pay rise.
Changing jobs gives you much wider experience, of different technology, approaches and cultures. Making you a better engineer.
If you have been in your current job over 10 years without significant internal promotions or changes of role then it is detrimental to your CV, indicating you are stuck in a rut and unable to handle change, eg. new technology.
You want to shift sectors.
I changed from public sector web developer, to commercial cloud engineer with one move.
You want to get into new technology that is not used in your current role.
I changed from a Python, Ruby config management automation engineer to a Kubernetes Golang engineer with another.
You want to change your role in tech, or leave it entirely. For example get out of sales as a solution architect and back into a more technical role as an SRE.

On that basis many software engineers change jobs every 2 or 3 years for part of their careers. Its expected, the average engineer in a FAANG stays less than 3 years.

Of course you probably need to be in a job for at least 2 years to fully master it.
If your CV has loads of similar positions where you barely make it past the probation period, its marking you out as a failure at those roles == Fail hiring at the first step, the HR CV check.

Upskilling

The other problem is that changing jobs to change roles, even if its just to use a new language or framework can be blocked by roles requiring experience in that area on the CV to get interviewed for the job in the first place. For software engineering that is less of an issue. Since tech changes faster than any other sector.
You just need to prove you have a range of experience and software languages and are willing to learn, early in a technology boom. To catch the cloud engineer bus, I got a job in it in 2016. The US cloud sector was $8 billion back then. It is $600 billion now. Similarly to get on board with Golang and Kubernetes in 2019. In the first few years of a tech boom most companies will initially have to cross train engineers without direct experience. The corollary of that is that in the current downturn attempting to pivot to an established technology, which k8s has become, is going to be much harder.

Market rates

Clearly ML ops and AI data science are current booming areas. The demand so far outstrips current supply that for switching to a more junior Python AI role in them may pay as well as a senior Django web developer for example.

So around £60k for a junior role, but in 3-4 years it should jump to at least £100k for a senior AI engineer. Of course for US salaries add 30%, plus usually free medical, life insurance etc. The lower tax rates cancels out the higher cost of living in the US ... so its UK salary +30% in real terms*. Researching the going rate for the particular role, technical skills and sector you are applying for is a necessary part of the hiring process. In order that you don't let recruitment bargain you down too low.

* Note that geographic software pay differences are why you often come across engineers of other nationalities emigrating to, and working in the higher paying countries, USA, Canada and Australia. I have worked with many people from the UK and Europe who live in the USA, and Indians who live there or Europe, for example.
Of course as a cheap foreign worker myself, I too stick with US companies partly because they pay rather more than UK ones, even if a lot lower than what I would get if I moved there 😉

Now is the time when such a switch will be easier to accomplish without having to work nights doing courses, certifications and personal projects. The usual means of demonstrating your ability without any work experience.

The caveat here is that moving jobs in a down turn, as we are arguably experiencing currently, can depress the market salary rates and if you are already at the top of those when made redundant, can mean you have to take a pay cut for a year or two rather than face the cost of long term unemployment.

The hiring process for an experienced software engineer role

Interview to Offer should take a month.

If not then the recruitment is likely for a group of roles in an expansion process and from screening and CV, you are not one of the top candidates. You may waiting on the backlog of potential interviewees for a couple of extra months before it properly kicks off.
Or you are told the post is no longer available, sorry!
Even if you would eventually get a post it may stretch your redundancy settlement. Therefore I would not bother pursuing any application process that is looking to be stretching on past 6 weeks.
Start date will be 5 weeks from contract (partly to cater for notice, referee and compliance checks etc)

That makes it 2 months minimum from applying for a role to starting.

The process will consist of a technical assessment task and at least 3 interviews, screening, manager and technical.
With another for introduction to team mates / office etc. which is unlikely to have any effect on the hiring decision unless you and your potential new manager take an instant personal dislike to each other.

The HR screening interview, just checks you are a genuine candidate for the job.

The Manager interview similarly is more about checking you will fit in with the company and team, plus that you have basic personal communication skills.

The Technical Interview is what matters

Passing the technical interview is what really decides whether you will get a job offer. Sometimes the tech interview may be split into two, one more task and questionnaire based and the other more discussion. Often the initial task part will be given as WFH.

The technical interview will consist of technical questions to explore whether you have the knowledge and experience required, plus some thing to confirm you can write code and discuss that code, for a developer or SRE role. For the former it would likely be application code whilst for the latter automation code.
For a more purely system administration / IT support role it will involve specifying your processes for resolving issues.

If you are unlucky and it is an in person interview, you may have to whiteboard pseudo code live in response to a changing task described to you on the spot. Although I have only had that once. More common, especially for hybrid / remote roles, is the take away task. To be completed in a 'few hours' at most.

It is possible that either of the above could be replaced by another source for your code. Talking through one of your open source packages, if you have any. Or talking through one or two longer automated coding exercise assessed tasks. I have never come across either of these though.

The main point is that the core of any technical interview for a developer related role will involve talking through code you have written, as a kicking off point to check your understanding of the code, how it could be improved, how you would tackle scaling, or a new exemplar functional requirement. Its faults and features.

You will be asked to talk through past code or technical work in a more generic manner in response to standard questions along the lines of examples of your past work that show how you fit the job.

Preparing for the Technical Interview

It doesn't take much to work out that a 20% pay rise is worth, a day's worth of work a week.
Assuming you stay in your new job for 2 or 3 years - that is equivalent to 6 months pay.
On that basis even doing a week of work to apply to, and prepare for a single job is still very well worth it, if you get the job.

Adopting a scatter gun approach, ie applying with a generic CV and covering letter to 10 or more jobs, is a waste of time in my view. If you need a new job, then it should be one you are genuinely interested in and research. That means probably you should only have a maximum of 3 tailored applications on the go at once. Even when I was made redundant (and about to get married) I think I limited myself to 4 job applications in total, with one primary one that thankfully I did end up getting.

There are many sites that can advise how best to do that, based on the framework that your hiring will be decided upon I have outlined above. I think preparing some Challenge Action Result stories targeted at the details of the new employer is useful. Plus spending a day or so refining that '2 hours' development task. Researching the company and preparing specific questions and perhaps suggestions for your interviewers.

Being a Technical Interviewer

From the other side of the table, clearly candidates need to show sufficient competency for the post. They may show it, but only within a totally different technical stack. Smaller companies tend to have less capacity and time to get people up to speed with new tech. So will likely fail these candidates even though they are capable of doing the job eventually.
The technical interviewers will tend to pair on the assessment - to improve its consistency. Swapping partners for interviews regularly also helps.

The assessment process is likely to use some online system such as Jobvite or Greenhouse where each interviewer assesses the candidate. Finally summarising it all with a recommendation for strong pass, pass or fail. Sometimes for a specific post and grade, otherwise the assessment can include a grade recommendation. The manager then rubber stamps that assuming appropriate funding is available. HR's job is to beat the candidate down to the lowest reasonable price, without going so low the candidate walks away.

A healthy growing company will tend to have a rolling recruitment process as they expect to be increasing head count in proportion to customers and revenue. On that basis they will likely be aiming to recruit anyone with a good pass, plus maybe most of the passes too.

Given that engineering jobs are highly specialised and require relevant experience I have not seen cases of way more interviewees than jobs. Currently, even with all the redundancies, there is still an under supply of engineers.
Also the approach for HR will be to set experience and skills pre-requisites for the roles that will keep the numbers down for those who make it to technical interview to around double the number of vacancies. Since it takes out a day of work for each interviewing engineer, to prep, interview and assess.

HIRING SUMMARY

You must pass each of the first 5 or 6 steps to get to the next one and get the job.

HR check written application is a plausible candidate
FAANG sized company - automated quiz / Leetcode style challenge - to reduce the numbers - because they get way more speculative applicants.
Recruiter chat to check candidate is genuine and available
CV skills / experience check vs other applicants to shortlist those worth interviewing
Technical task could be takeaway or whiteboard / questionairre interview
Technical interview, in person or Zoom with engineers
Manager interview / introduction to team mates.
Recruiter chat. Negotiate exact salary. Agree start date.
Contract is signed, YOU ARE HIRED.

Software development with Generative AI

2023-12-11T09:57:00.031+00:00

The Current State of AI Software Generation

The user tries to describe what they want generated in terms of a snippet of high level programming language code using standard English. They submit it to the AI tool. So what are they asking the AI to generate and how does it do it?

The high level language

High level programming languages are human languages composed of english and maths symbols designed for the comprehension and composition of precise computer instructions. The language makes no more sense than English to a computer. It has to be compiled or interpreted to computer language for it to run. So it may compile to an intermediate bytecode language and then maybe to human readable assembly language - before final translation into the unreadable machine code that the computer runs.
A programmer learns the high level language and becomes fluent in it. They can read and understand the functionality of that code. With the complexity of the machine specific implementation stripped away.
Leaving just the precise functional maths and english / symbology that describes the computer functionality. They think in that code, in order to write it.
Even then, the majority of a programmers time is spent debugging the high level language - and fixing what they have written to be bug free. Because it is difficult to think clearly in code, pre-determining all edge cases etc.
Unlike English language, it can succinctly describe computer functionality in a few lines.

The AI

A detailed English language description of what functionality is required. Plus the name of a high level programming language, are submitted to the AI tool.
It does a search of the web, eg. stack overflow etc. for results for that code language. For Chatbot use (eg. ChatGPT) it applies an English language Large Language Model, LLM (a numeric encoding of learning of the English language) to generate a well phrased aggregation of the most popular results that match the English prompt.
For software use (eg. CoPilot) it works just the same, but the LLM learns English to high level software language aggregate translation. From code examples data, eg. github, to generate what the code syntax might be to match the English description of it.
Finally it returns an untested snippet of generated high level code.

The Non-Developer

The non-developer pastes it in place and tries to run the program with it in.
They may be able to puzzle out the high level language - but don't naturally think in it, just as people without mathematics skills can only think as far as basic arithmetic and are dyslexic when it comes to complex equations.
It seems to work around 50% of the time. When it fails they, go back to square one and try to rephrase their English prompt.

They patch together block after block of prompt created generated code. A crazy paving of a program that likely has a number of bugs and inappropriate features in it. But it kind of works, for the non-developer, that is good enough.

The code gets pushed out there with all its imperfections, and starts to populate the web of code data that is used to generate the next AI code snippet.

Or the Developer

The developer reads the code and understands it, determines if it should do what they want. Or if they just want to use some of it as an example.

They cut paste and rewrite it, using it as a hint tool. Or an extension to their IDE's existing auto-code generation tools that work using templated code and language / import library searches.

Hopefully their IDE is set up to clearer distinguish between real code completions and possible generative code completions. Since otherwise the percentage of nonsense code created by the generative AI pollutes the 100% reliability of IDE code completion, and harms productivity.

Then they run their code and debug as usual.

At least 75% of programming time is not on writing code, but on making sure that the high level instructions are exactly correct for generating bug free machine code. So iteratively refining the lines of code. With code a single comma out of place can break the whole program. When language has to be so carefully groomed, succinct minimal language is essential.

For many developers adding an imprecise, non mathematical language, that is entirely unsuited to defining machine code instructions, such as English, to generate such code is problematic. It introduces a whole layer of imprecision, complexity and bugs to the process. Slowing it right down. Along with requiring developers to write a lot lot more sentences (in English) rather than just quickly typing out the succinct lines of Python (or similar) programming language they have in their head.

The generative AI can help students and others who are learning to code in a computer language, but can it actually improve productivity for real, full time, developers who are fluent in that language?

I think that question is currently debatable. Because I believe the goal of adding yet another language to the stack of languages that need to be interpreted for humans authoring computer code, especially one as unsuited as English, is only useful for people who are far from fluent in the software language.

Once we move beyond error prone early releases of LLMs like ChatGPT-4 then tools such as CoPilot may start to become much more effective at authoring software, and actually produce code that is as likely to work first time and have the same amount of bugs as your average software developer's first cut of the code. We may reach that point within a year or two. At which point professional software developer will need to be adept at using it as part of their toolset.

Even so I believe the whole conception of the application of AI to writing software could benefit from more work engaged in a computer centric alternative approach to the current one focussed on generating plausible human language responses. It only dominates because of all the efforts related to NLP and human interaction. But taking that and sticking on to writing human software languages is more about creating a revenue stream than attempting to have AI do the main work of software development.

Until then, AI will never be able to replace me, as a software developer. Only be another IDE tool I need to learn ... in time when it improves sufficiently to increase productivity.

NOTE - June 2024 Update
Having come back to CoPilot 6 months later. I have come to appreciate some of its new features so have added a new blog post that accepts that it now provides utility even for the seasoned programmer.

Another Way

Copilot and the like currently use the ChatGPT approach of a Chatbot front end tied to an English language LLM to generate aggregate search engine results in a human language. But there is no domain specific machine learning knowledge about the semantics of the content. So it doesn't understand, and certainly doesn't pre-check the code. Just as ChatGPT doesn't understand the search engine content. Since currently there are no domain specific trained models for the content in the loop. So if asked a question about pharmacy it doesn't plug in one of the AI models that has learnt pharmacy and is used by that industry to aid in the development of medicines. It understands nothing, it is a chatbot, just a constructor of plausible answers based on search popularity.
Similarly CoPilot has learnt how to predict what code somebody might be trying to write, but it hasn't learnt how to code.

This approach cannot lead to AI generating innovative new coding approaches, full self-coding computers, or remove the need for human readable high level programming languages.

There have been experiments with applying test driven development to AI generated code, but I have not heard of serious attempts to address the bigger picture...

Move all functional code writing to be AI only.
Remove the need for any high level computer language for humans to gain fluency in.
Have AI develop software by hundreds of thousands of iterative composition TDD cycles.
Parallel refactoring thousands of solutions to arrive at the optimum one.
Use AI that understands the machine code it is generating by training it on the results of running that code.
The ML training cycle must be running code not matching outputs to pre-ranked static result training sets.
In addition to the static LLM that encodes the learning of machine code authoring, dynamic training cycles should be run as part of the code composition. Task based ephemeral training models.
Get rid of the wasted effort training AI to understand English, Python, Java, Go or any other existing human language evolved for other tasks.
Finally we are left with the job of telling the computer what its software should do.
We do not want to use English for that, its way too verbose and inaccurate, similarly we don't want a full high level programming language to do it. We need a new half way house. A domain specific language (DSL) for defining functionality only, designed for giving software specification's to AI that it can use to generate automated test suites.

Self-Programming Computers

Exploring the last point in more detail...

Create a higher level pseudo-code language for describing the required functionality that is more English readable than even current high level languages such as Python.

Make that functional DSL focus on defining inputs and outputs - not creating the functionality, but creating the black box functional tests that describe what the working code should do.

Maybe add tools for a slightly no-code approach, with visual generators for the language, eg graphical pipeline builder tools. For people who find thinking visually easier than thinking symbolically.

The software creator uses the DSL to create an extensive set of functional definitions for a project.

The DSL language design and evolution is optimised for LLM interpretation. So it has very tight grammatical and syntactical usage that promote accurate generative outputs.

A new non-developer friendly high level pseudo code language / rigorous AI prompt writing lingo.

Some basic characteristics of the DSL:

auto-formatting (like Go) minimizing syntactical variation
To quote Python's creator - 'There should be one-- and preferably only one --obvious way to do it.'
But strictly applied, rather than as a vague principle as Python does
unlike any other high level language, the design needs to be optimized only for specifying functionality, a high level templating language from which test suites are generated.
the language will never be used to implement functionality
uses simple english vocabulary and ideally minimal mathematical symbology

These DSL prompts are written with a LLM for the DSL it helps create its own prompts and the code creator uses it to refine all the DSL definitions that specify the full functionality.

The specification DSL auto generates all the required tests in a low level language.

Since the system should also have a generative AI LLM trained for C or assembly language.
This is what creates the actual functional code by iteratively running and rewriting it against the specification encoded into the tests.

The AI tool then generates the tests for that implementation and uses TDD to generate the actual functional code - eventually the system should improve to a level better than most software developers. The code it writes no longer needs to be read by a human - because a human will be unable to debug it at anything like the speed the AI tool can.

So we use generative AI to do the part of the job that actually takes all the time. Debugging, refactoring and maintaining the code, making sure it really does what is required functionally. Rather than the quick job of writing a first cut of it that might run without crashing.

Most importantly we don't introduce the use of the full English language, the language of Shakespeare, the language of puns, double meanings, multiple interpretations, shades of grey, implied feeling and emotions, into a binary world to which it is entirely unsuited.

Also we don't need English or high level computer languages in the stack of mistranslation at all.
Because we are not training the AI to understand human languages. We are training it to write its own machine code language based on defining what behaviour it should implement.
BDD / TDD generative AI if you like.

Human's no longer learn complex mathematical process based languages that can be translated into machine code. They learn a more generic language for specifying functional behaviour.

This gives more freedom to widen the DSL to mature into a general precise AI prompt language.

Whilst allowing computers to evolve more machine learning driven software architectures that are self maintaining and not so constrained into the models imposed by current human intelligence and coding practise based programming languages.

Could AI could take my job?

Perhaps if all of the above were in place, then finally we would arrive at a place where AI could replace traditional software development and high level software languages.
With concerted effort it could be in 10 years, if some big companies put some serious investment in trying to replace traditional software development.
Code monkeys will all be automated. Only software architects would be required and they would use a new functional specification AI prompt language, not a programming language.

Of course if politicians are scared that dumb ChatGPT can already write as good a speech as they can. Plus replicate all the prejudices and errors of its training data and trainers.
Then setting AI free to fully write software, and itself ... will be way more scary in its long term implications.

Meanwhile we are currently at a place where it arguably doesn't even improve productivity for an experienced software developer, only allows non-developers, students and other language newbies to have a go at writing one of the many dialects of human languages, known as computer languages.

Their mix of math, english, symbols, logic and process may appear more like English than Musical notation or pure maths, but sadly they are no more suited to creation by an English language Chatbot approach.

Sustainable Coding, and how do I apply it to myself as a Cloud engineer?

2023-07-05T10:23:00.029+01:00

I work as a developer of a Cloud service, Big Animal - EDB's Cloud Postgres product. So I went along to a meetup the other day, a panel discussion on Leveraging Cloud Computing for Increased Sustainability

It got me thinking about this whole issue, and how in a practical sense I could do anything that might reduce the carbon footprint of the service I work on.

The conclusion I came to was that I don't really know ... and to some extent neither did the panel, so Cloud computing may give you some fancy tools to help assess these things, such as Microsoft Sustainability manager. But there are no black and white answers as to what would make something more sustainable - even the basic one of - run it in the cloud or on prem, very much depends on what you are running and how. For one or other to work out as the more sustainable.

So on a global scale just how significant is computing as a percentage of global energy consumption and emissions?

The Cloud Climate Issue

Comparing today with 30 years ago is useful in terms of seeing where we are going...

1990s vs 2020s IT as a proportion of global energy and emissions

1990s Energy 5% (Most from Office desktop computers and CRTs) - 2% emissions
Today 8% (Most personal devices, laptops and mobile, includes 2% data centres) - 3% emissions
Compute power / storage is around 30,000 times greater (by Moore's Law)
Data 16 Exabytes (EB) has grown to 10,000 EB so 600 times, with the majority in the last 3 years

Today data centres (hence the Cloud) is causing 2% emissions, as much as the whole of IT in 1990 and as much as today's aviation industry.

So working as a cloud engineer looks like a poor choice for someone concerned about climate change!

But on the face of it we have been pretty efficient, our compute and storage has massively increased, yet consumption + emissions only by 50%. But the issue is the acceleration in usage, which means we could double energy and emissions in 20 years, if nothing was done to improve sustainability.

The increase in compute power has remained fairly consistent since the advent of the transistor making Moore's Law, more a law of Physics than of human behaviour. Although of course that technology is now at its limits of miniaturisation. So the energy and emissions consumed per Gigaflop of compute has drastically dropped - but now everyone has the compute power of a supercomputer in their pockets.
The first supercomputer to reach 1 GFlop was the Cray in the 80s, by the 90s an IBM 300 GFlops supercomputer beat Gary Kasparov at chess - today a Google Pixel 7 phone is 1200 GFlops.
Hence our consumption has rather outstripped our increase in compute.

But it is the explosion in data that is a story of human behaviour. Hand in hand we have reduced costs of cloud storage and monetised personal data. With software companies valued based on how many customers, and more importantly how much customer data they have. Recent advances in AI have proved the value of big data lakes for training models to produce practical ML applications.

Combine that with the problem of induced demand. The more and bigger roads you build, the more traffic you get. Cloud puts a 6 lane highway outside everybody's front door.

How do we measure sustainability

So within the world of commercial sustainability, and carbon off setting, there is a basic concept to categorize things as scope 1-3 emissions.

Scope 1 covers emissions from sources that a company owns or controls directly.
Scope 2 are emissions that a company causes indirectly and come from where the energy for the services it purchases and uses is produced.
Scope 3 encompasses everything else. So suppliers energy use, etc.

The assumption is that raw energy consumption is not the issue. It is the generation of climate changing emissions to generate that energy that is the metric.

This includes mining for minerals to build laptops and data centres, etc. But if you run your own green energy solar farm next to your data centre, and that is direct powering, without any significant battery storage. Plus feeding energy back to the grid, you can be pretty much carbon neutral. You can also fund renewable energy projects and offset.

So strangely perhaps, given how full cooling can treble hardware life span. The biggest data centres in the world are currently built in the world's desserts rather than at the north pole. Solar and wind can be relied upon for more than 100% power.

Microsoft Azure was carbon neutral in 2012. It is aiming for 2030 for its whole business (then for 2050 to removing all its carbon debt since it was founded in 1975)
Google Cloud became carbon neutral for all its data centres in 2017, also aiming for 2030 for all its business.
Amazon is aiming for AWS cloud neutral by 2025, and as a global retail supplier to do the same for its whole business by 2040.

Of course this is not possible in most European countries, so most carbon neutral data centres in Europe will be from purchasing carbon neutral generated energy, rather than actually being neutral in themselves. Although some go a long way down that road, to partner with renewable energy suppliers and tick a number of other sustainability boxes. The problem is if data centres are buying up lots of the renewable energy supply at a premium, then that means they are removing it from residential or other uses. So this is hardly helping global sustainability and in reality means they are far from neutral.

Also carbon neutral means only that scope 1 is covered. Net zero is a standard above carbon neutral where to deal with scope 2 and 3, emissions must be taken out of the atmosphere. So that in practise only a net zero supplier is actually contributing nothing to climate change. No cloud provider is net zero.

A key point is that the latest enormous scale cloud provider data centres are not the main source of emissions, it is all the older, smaller, more local data centres and machine rooms of servers that are causing the majority of the emissions. In the same way that car pollution is disproportionately down to older vehicles. Of course there is the manufacturing footprint to consider for cars that can last 40 years, but all computer hardware has a much shorter lifespan of 3-5 years. Obsolescence makes increasing the lifespan uneconomic. Another green issue that could fill a blog post on its own.

So moving to cloud provider's services and migrating any remaining on prem to the cloud is the sustainable thing to do, as long as what is moved is suited to cloud, or can be re-architected for the cloud.

What changes, as a developer, could improve sustainability?

Life Style

So the obvious thing that people think of, is the nature of their employer's work. Or perhaps if your company is a B2B one, do they have green standards wrt. the clients that they work with. For example it may not make sense working for ExxonMobil as the company with the world's largest emissions. Perhaps the tech industry equivalent would be working on Crypto currency? But Blockchain developers are working on that reputation, even coming up with useful uses for it!, such as auditing sustainability usage for scope 2 and 3 verification.

Over half of internet traffic these days is video streaming, so stop watching Netflix and scrolling on TikTok - and read or listen to books instead is maybe a good behaviour 😉
On the plus side porn has dropped from its high as 25% of internet traffic down to around 10%, but it has been more than replaced by cat and side hustle millionaire videos it seems. So if your side hustle is being a prolific social YouTuber, it may not be the most ecological of life choices. Since an hour long short story of digital text is 100 Kb whilst the same as a 4k video is a million times bigger at 10 Gb.

On a personal level, my previous employer was more office orientated. So it was keen to encourage people into the office with free food etc. it encouraged commuting in to work, and the maintenance of offices with permanent desk space for every employee, monitors, heating etc. and all the unnecessary extra emissions that entails. My current one is more remote-first.

In terms of remote work, having experienced pandemic lock downs in a city. I was going out for a regular cycle for exercise, so I can confirm that the reduction in emissions may have only been measured at 20% across the whole of Britain, but in the cities it felt more like 50% - the air was so much more breathable.Whilst maximising WFH is not equivalent to pandemic lock downs, it does make a difference. So changing jobs in the tech sector, to a full-time remote position, is certainly a worthwhile contribution to sustainability.

There is the argument that if we all lived alone in big drafty castles which could be turned off for the day by packing into an office a walk away. Then remote working is not more sustainable. But the reality of IT work today, especially with hybrid working, is that the big fairly empty building you are more likely to be in these days, is the office.

So become full time remote if you can. If you have to work for an office based employer then choosing one that has hot desking, smaller offices, less frequent attendance and live within walking or cycling distance, are all part of being sustainable wrt. your tech job.

Sustainability for a Cloud SaaS company

I work for a company that produces a cloud market place software product, with most engineers working remotely and running no servers at all, just employees laptops, ie everything we run is via cloud providers services. We have a few offices globally but only a minority of engineers use them. Since all teams are largely remote, there is no office, no paperwork, commute or physical products.

So the same applies to all our other services, eg. from CI to presentations. From LaunchDarkly to our CRM. From expenses to online mental health support etc. Plus Slack and Zoom for comms.

This is a pretty common model, your could call it a server-less company, it was the same at my previous employer. We sell SaaS and we use it for everything internally too

Therefore the assumption is that the problem of working out 2 and 3 should be solved by those cloud providers, which to some extent it is ... maybe some less than others. But emissions data can be obtained for scope 2 and 3 from them.

So that leaves scope 1. This may be hugely affected by how much face to face sales and marketing goes on etc, but that is not my area. So I am purely going to focus on what options are there to improve sustainability wrt. to the software architecture, development and deployment practises available for producing a cloud based software service, SaaS. Since those are the areas that as a software engineer I can influence.

So lets break that down to some basic elements, and work out what are the more sustainable practises and approaches.

Cloud vs. On Prem

So first things first. Is working for a company that runs everything on cloud, and delivers a cloud based product a good thing, versus writing software for running in a local server room or data centre?

Assuming you use one of the big carbon neutral cloud providers and are using virtualisation to scale capacity efficiently with usage. Then it is likely that a Cloud data centre will be run much more sustainably than a local data centre where you may house your own servers and certainly a local machine room. So even if you are running a specialised HPC data-centre where the majority of traffic is local ... third party providers will be able to offer more sustainable options.

Of course if you software is entirely unsuited to cloud virtualisation (k8s micro-services etc) or badly designed for it, you could actually be running up way more resources than a local monolithic solution on a few dedicated servers would. So sustainability goes all the way down through the architecture to the lines of code, and what they are written in.

A whole load of legacy software dumped onto the cloud can be less sustainable (and way more costly) to run than running it locally.

So another sustainable employment decision, is to not work for an organisation that either has a lot of legacy software or has its own servers or data-centres, or at least only if they are bigger than a soccer pitch (ie average DC size or bigger) and have their own adjacent wind farm or other local renewable power source.

But if like my employer, everything is run on the three major cloud providers, and there is very little in the way of scope 2 and 3, then is the sustainable business box ticked already?

Unfortunately not, as mentioned, they are not net zero and ~2% of global emissions are from running data centres, so whilst that may be disproportionately from ones that are not the self powered giant DCs used by the big cloud vendors. Being as efficient as possible wrt. use of Cloud is still the key to being a sustainable tech worker. Especially with the projected growth in Cloud and its emissions being a significant ecological concern.

Choice of software languages

So the reference paper often quoted (and misinterpreted) for software language sustainability is this Portuguese University paper on Energy Efficiency across Programming Languages.

Where we could perhaps regard sustainability as the combined goal of minimising energy = performance, time and memory usage (table 5. in the paper). So leaving out the older / less mainstream languages we have ...

C, Go
Rust, C++
Java, Lisp
C#
Dart, F#, PHP
Javascript, Ruby, Python
TypeScript, Erlang
Lua, JRuby, Perl

So on that basis we should write everything in C or Go or possibly Rust, maybe even Java if we are not that eco-friendly.

Whilst I do use Go for writing Cloud micro-services, I think the paper's focus on executing a few specific algorithmic performance tests is maybe not an entirely representative approach.

I have been a Python developer for 20 years and Python is ranked almost last for speed. 75 times slower than C at the top spot. But even if this were the case across all uses, then it ignores the fact that for compute heavy tasks where Python is employed in number crunching, it uses high performance libraries for the core processing functions. So Numpy is half C and runs all the big matrix manipulations in C.

Hence the API coding and setup is in Python but it is not actually running everything 78 times slower than C, it is running maybe at worst, half the speed of a pure C program. Plus that custom pure C program could well have taken a lot longer to write and be less reusable, so in total use way more energy than a Python version would. Especially for short lived code and Jupyter interactive coding orientated use cases such as used in the science and finance sector.

There are further optimisation approaches such as Numba, when Python is being used for fast computational use cases which can compile straight to CUDA machine code for GPUs.

A paper comparing Java, Go & Python for IoT decision making. Similarly puts Go at the top for efficiency, but places Python above Java (presumably Python was using SciKit hence C for performance critical algorithm execution). So clearly the use case and the methodology of the study, can make a huge difference in the measured efficiency.

The same could probably be said for a number of the other languages languishing at the bottom of the table. If measured for executing a real world use case rather than a pure language implementation, the results can be much improved.
However for very nimble light weight micro-services then a directly compiled language like Go is going to use less resources than languages using JIT VMs and/or an interpreter.

Then there is the core point that most applications in the cloud are not highly intensive calculation based ones. The performance of the majority of applications are more likely to be due to the data I/O on the network between services and storage. Where raw algorithmic performance has little impact.

What does matter is that running up parallel processes is simple and lightweight.
That core feature, along with the simplicity of Go and its small footprint were designed specifically for cloud computing. Which means, becoming a Go programmer, or at least learning it. Is a good choice for the more sustainable programmer.

It is also why ML/Ops will often use Python at the development and testing stages of ML models, but then switch to Go implementations for production.

Software Architecture

The architecture that is deployed to cloud has a huge impact on the efficiency of a cloud service, and hence its sustainability. Certainly it is going to have much more impact on energy wastage than the raw algorithmic performance of the language used.

The architectural nirvana of cloud services are that they are composed of many micro-services, each managing a discrete component of the service's functionality and each able to scale independently to provide a demand driven, auto-scaled service that ramps up and down whatever components are required from it at any given time. Morphing itself to provide always sufficient capacity. Not needing stacks of wasteful hot failover servers running without a job to do. Not getting overloaded at peak and failing to deliver on uptime.
The ideal sustainable use of hardware, always just enough. Virtualisation allowing millions of services to ramp up and down across the shared Cloud provider DCs vast hardware farms.

Clearly, combined with Big Cloud using the latest carbon neutral DCs, this ideal is much more sustainable than each company running its own servers and machine rooms 24/7 on standard grid non-renewable power, for a geo-local service that only approaches full capacity twice a day, and could probably be happily turned off 6 hours a night with nobody noticing.

From this perspective, one the big cloud vendors are keen to promote, Cloud is the sustainable solution not the problem.

Unfortunately that ideal is often very far from the reality.

Software that is essentially monolithic in design can end up being lifted and shifted to the cloud with little refactoring. At best the application is maybe chopped up into a few main macro-services. UI, a couple of backend services and data store as another. Then some work done to allow each to be deployed to Kubernetes as pods with 3 or more instances in production. Ideally the replicas are identical in role and have good load balancing implemented, or multi-master replication for the storage. But often primary-replica is as good as it gets.

Essentially an old redundant physical server installation with a few big boxes to run things is being re-implemented via k8s. Then repeat that per customer, usage domain, geo-zone or whatever sharding is preferred. Big customers get big instance's - the providers have wide sizing ranges for compute, storage etc.

Its better than just setting up a VM to replace each of your customer's on prem boxes - and basically changing almost nothing from on prem installs, but any increased sustainability is only that provided by the Cloud vendor's DCs. The solution is not cloud scale with auto-scaling, its repeated enterprise scale with a lot of fixed capacity in there.

For these cases maybe consider swapping out some elements with a cloud provider scaled service, eg the storage. Whether that is by using the Cloud provider's solution or a third party vendor's market place one.

Even for software that has been freshly written for the cloud there can be architectures that consume excessive resources and are overly complex, some times because of the opposite issue. So with the budget to rewrite for cloud, then developers can leap too fast for all cloud scale solutions - when the service has no need of them. For example deploying multi-region Kafka for event streaming and storage, when data could happily have been sharded regionally and put into a small Postgres or MariaDB cluster.

Repeatedly firing up a 'micro-service' k8s job that is very short lived but uses a big fat monolith code base, so that 80% of the time and cost of the job is in the startup. This is where language matters more, the lighter and faster the language, the smaller the binary and its memory usage, the better.

The use of gRPC between micro-services provides 10 times the speed of REST, which can be reserved just for the whole service API to the UI and CLI.

One key indicator of waste is the obvious one of cost. If your new cloud deployed application is generating usage costs that work out far more expensive than the TCO for its on prem deployment. Then its architecture is not fit for cloud use. You are burning money and generating excess C02.

Sadly with architecture it all depends what suits the scale and use cases of a service. So there is no simple fix it advice here.

Development, Testing & Release practises

Testing and release are probably the most important area of Cloud software development that could benefit from more sustainable practises. This is perhaps more a pitfall of the rise of Docker and infrastructure as code, rather than Cloud itself, but the promise of replicable automated built software environments has delivered.

What it has delivered is a development to production life cycle where developers can spin up any number of their own development environments - even one per Jira ticket, automatically built on its creation perhaps.
In order to get merged with the release code your team choose to run the full E2E suite. It takes a little while, but we can speed it up by running the 5 clusters we need in parallel for each test environment case. These also standup the whole environment, load it with fixture data and run E2E tests on it, maybe some infrastructure ones too, that failover the storage and restore from backup.
But at least they should automatically teardown the test clusters at the end, where as dev clusters can hang around for months without cleanup.
Then once it passes it goes out to the dev environment which has its own testing barriers for release to staging. Staging should be the same hardware resourcing as Prod so that it properly tests that it is working for it, perhaps with some load testing or maybe that is done in another set of clusters.
Finally it gets to roll out to production, but maybe for safety prod has a set of canary environments it goes to first, for final validation before it can be rolled out to customers.

So to get 20 lines of code into production. We could easily have a process like the above, that involves spinning up over 10 temporary k8s clusters and uses hundreds of longer life ones. Just running the E2e and infra tests will take over an hour.

This is seen as good practise in the Cloud world. Rigorous testing before release to production. It is pretty common for companies producing a cloud service. Since most software companies now have to produce a cloud version of their product to satisfy the markets then that is a lot of companies. For the first year or so, all this will be run at the cost of millions of dollars, with hardly any customers using it. Because that is what you do. Agile, get the product out, then grow and refine it and the team developing it. Build it and the customers will come.

This is a hugely wasteful process, and it is not far from Crypto in terms of generating emissions, for something that has no practical use yet.

If we do end up with a lot of customers fine, but for services that are not multi-customer architecture, ie big revenue small customer numbers, there may well be customer specific customisations of the product ❄❄snowflake alert❄❄ So the easy option is the duplication of as many clusters in dev and staging as are in prod, to cater for fully testing for those big clients. So a great deal of duplicate resource spend.

So there should be a lot more consideration of sustainability when establishing the above practises for the development to release cycle.

One way to address this issue is to push as much testing as possible down the testing pyramid.

Unit testing is less useful for cloud since the whole point of Cloud and micro-services is to do only one thing and knit together via API calls the full service. Which means there may be very little functionality that can be tested by a unit test, since everything needs to be mocked.

However that doesn't mean that things cannot be faked, fakes allow fast functional testing of micro-services. Fakes can mean the full emulator's of services, eg. Google pub sub. Or running your gRPC services over its test fake, local memory, bufconn.

But the aim should be to establish a full fake test framework that can run up your service on your laptop. Ideally without the need of a k8s fake like kind to stand it up. Since we don't want to fake the deployment environment - just the running code. Functional tests can then be written that can be used like unit tests to check PRs pass in seconds as part of a git workflow. Running those same tests at regular intervals against full deployments can validate that they correctly mimic them.

There should be layers of tests that validate code before the E2E test layer and do not just have unit and E2E, since then the validity of the code relies on full deployment testing. Full deployment testing should just be run as part of the release process, it should never be run at the PR validation level, it takes too much time and energy.

Developers can have reasonably long lived personal dev clusters not one per PR, maybe even resort to shared dev clusters per team, to reduce spinning excessive amounts of cloud resource for development.
Automated shutdown based on inactivity should be the norm.

Time should be invested in developing good sample data for non production environments. They should not resort to duplicating all customers, regions or whatever sharding. Plus a bunch of test versions of them. If you have more things running in dev than in prod, you are doing things wrong.

Another route to take is to only have production for long lived deployed clusters. With temporary clusters for automated testing and the use of feature flags to cater for final stage testing in production sandboxed feature enabled clusters, prior to full release. So this separates deployment from release - the latter can then be moved outside of engineering, once a flag has passed testing and validation.

Temporary clusters can use tools such as vcluster for automated short lifespan k8s clusters, significantly reducing the resource usage and speeding up the spin up time, for dev clusters. Hundreds of pseudo separate k8s clusters for dev and testing can be run in a single k8s cluster.

Anything else?

The explosion in data is not just all video streaming. Observability is a huge topic, the amounts of telemetry and logging that a well SRE engineered service needs can be overwhelming. Clear management of that, and limits on retention (at least outside of cold - ie tape / optical - storage) are essential. Such things as the ability to turn on higher info debugging levels for very restricted sets of environments. Provide valid ML learning data sets without filling up data lakes of hot storage, etc.

There are still so many more things that impact Cloud sustainability in terms of Cloud applications ... however this blog post is already unsustainably long 😀. So I think I should end it here.

The main point is Cloud can be the sustainable option, but only if cloud engineers put in the effort to make it so, by pushing for the most sustainable architecture, development and release practises in our every day work.

Ten Software Engineering Managers

2023-05-13T15:31:00.009+01:00

Engineering management from the perspective of the managed.

I have worked in the software industry for many years. Working in both the public and private sectors as an individual contributor software developer, SRE and cloud engineer.
Along the way I have been managed by 15 different managers, along with having work interactions with around another 100. So naming no names I thought maybe it is worth distilling my managed life into a set of software manager caricatures.

To illustrate what makes a good software manager (and a bad one).

I accept that since I have never chosen to become one, then I am criticising a skill I have shown no interest in acquiring myself. I have only partially dabbled in it, via senior IC roles, ie Staff Engineer technical advocacy. However it is a good chance to let off steam ... and possibly a software manager may read this and reflect on which characteristics they may want to work on. So guys its time for your 360!

Remote managers are better

Having worked for many years in offices with my manager sat at the desk behind me, literally looking over my shoulder. I should probably also admit that in recent years I have chosen to work full time remote, ideally with my manager in another time zone, or at least another country. Luckily I got on with those managers that were literally breathing down my neck. But it certainly didn't help me get the job done.

Full time remote is probably not so good an option for those who are just starting their career. However for established engineers it does tend to insulate you from the various forms of toxic management out there and lets you get on with the job. It also requires you to be more explicit about the engineering process, collaboration and documentation, and hence be more productive in a more maintainable manner.

Bullying, micro-management, under-mining, overly personal etc. Although these can all still happen in front of colleagues on Slack and Zoom. But it is easier to shut down a conversation and walk away virtually so that the manager has time to calm down and control their behaviour.

Manager's skilled at managing remote international teams have to be skilled at targeted succinct and effective communication. Especially if they only have a few hours in the day when the team members time zones overlap.

On average I have found remote managers to be better managers. Although that may just be that the UK is renowned for bad management - both anecdotally and in terms of the UK's productivity ranking. So managers that are from more productive countries than the UK are likely to be from a better management culture - hence better managers.

Peer level managers are better

In a traditional hierarchical organisation. The CTO is the most senior manager and so remunerates people more who are managers like themselves. Particularly in the public sector there is an unwritten rule that even the most junior manager must be paid more than the most senior engineer.

This approach naturally leads to some rancour amongst senior technical staff who want to stay technical. It also devalues technical skills. Since to increase their salary technical staff must eventually give up all their years of technical skills and somehow gain 10 years of skill in people management overnight. Of course this doesn't happen so instead you get very novice people managers with a lot of largely irrelevant technical skills, and perhaps a personality totally unsuited to enabling their team.

It is easy to filter out such an organisation. Just get on one of the company feedback sites, eg. Glassdoor, Fishbowl etc. Check what the top IC engineers salary range is and check that it is higher than the junior management range. Ideally you should expect the top IC grade, eg. senior technical architect to be paid around 50% more than an engineering manager. But take it as a red flag if there is no technical grade that is paid more than any management grade.

Because it means the organisation doesn't value your engineering skills, and you are likely to be managed by someone who doesn't value them either and may regard themselves as your superior. So why would you work there? Surely better to work for a manager who treats you as a peer in an organisation that values your skillset.

Most people quit their job because they have a bad manager

The surveys tell us 43% leave their job because of a bad manager. With the next most important reason being general toxic culture / under appreciation. Whilst pay and progression comes in third.

We all want to get the number 1 manager, but unfortunately we often end up with 10.

So most people leave their jobs because of getting one of the bottom ranked manager types.

HR's job is to minimise disruption to the company, they will tend to take the side of the more senior employee unless that employee's behaviour is clearly proven to be detrimental on the wider scale to the organisation.

A few junior employees reporting them for incompetence or abusive behaviour, does not usually qualify.

So if you do complain then it is unlikely to improve the situation, and may well make it worse if the manager is informed that you complained about them. Sadly I have not personally heard of any case where complaints about bad management resulted in resolving a problem, but definitely have heard of cases where it made it a lot worse for the remaining time before the complainant's notice period was done.

So unless a more senior member of your organisation is on your side, and decides to deal with the problem manager. It is best to leave your job if you have a bad manager. For a big company that may just mean changing departments. But leaving your job is safest in terms of removing yourself from a toxic environment, plus you can honestly report the manager as being the reason for leaving in your exit survey. It is a chance for you to help your ex-employer, you shouldn't expect them to act upon it immediately. But given a sufficiently high attrition rate for a manager's team, the higher level managers should hopefully have enough sense and self interest to deal with their failing colleague.

Ten Software Managers

1. The Enabler

The ideal technical manager is the enabler. They have better personal skills than software skills.
They are most likely to either have never been an engineer, ie they entered the industry as a professional technical manager. Or they were an engineer some years ago but are more interested in the people than the code so changed direction fairly early into their career.
They will be well aware of the wider environment of the organisation and the stakeholders and drivers for the technical work. A great communicator. With knowledge of all the systems and processes and so how to unblock things and get stuff done procedurally. Plus be a talented scrum master.

Most likely to say: How should we fix that process for you?

Catchphrase: Thanks for your contribution 😊

2. The Big Cheese

You may find at some point in your career you get a manager who is actually a much more senior person in the organisation than their surface role as an engineering team manager. Maybe they founded the whole company, or they are the head of large department with other non-software engineering work.

To be high in the organisation they are likely to be a better manager than the average manager you might expect, with great people skills.

(Obviously this rule may not hold once you get to CEO level, I hope nobody who ended up with Elon Musk as their direct line manager is reading this ... although I don't think he was interested enough in software to directly manage software engineers, given he gave up coding at 20 before getting any formal training in it.)

They are likely to be more of a leader than a manager, but likely to be particularly good at fostering the development of their engineers. They are also going to have all the contacts to unblock any issue that may arise, plus have their finger on the pulse of high level changes and strategy that may impact your job.

They may still have a surprising high level of technical understanding of the company's software, but as a senior manager they also understand that their job is all about enabling an even higher level of understanding, and technical decision making about it, in their engineers.

On the down side, they probably don't actually have that much time to devote to you personally, so don't be surprised if they fail to act on things you have suggested to them. If there is any issue they make damn sure somebody in the team takes responsibility for sorting it out. So be prepared to be volunteered by them.

Most likely to say: How are you aiming to progress in the company?

Catchphrase: Isn't it amazing what we are building at (fill in organisation name)? 🏆

3. The Bro

They were happily doing their engineering job when the manager for their team left. They were not necessarily the most technical member of the team, but they were the one who got on with everyone and the one that everyone in the team was happy enough to have as a manager. So they took the job.

They want to be you best friend and genuinely aim to protect you as a member of their team from problems or issues that are coming down from higher up the management hierarchy.

They are reasonably technically aware and skilled, but don't try to make the technical decisions or deal with issues unless nobody in the team steps up to the plate, then they take the task on themselves.

They are just one of the gang, but your manager too.

Most likely to say: Let me buy you a beer ... umm, sorry, everyone in the team has been made redundant.

Catchphrase: Yeah, what the hell are management up to 🍻

4. The Middle Manager

They used to be a techie, many years ago, but they weren't really interested in tech and hence were probably pretty mediocre at their job. But thankfully they got on the management track, result! They love the politics and intrigue of management way more than technical details. They have good people skills but find it hard to hide the fact they have absolutely no emotional investment or interest in technology.

They literally couldn't care less if the head of the organisation declared that from this day forth all software in the company will be written in Cobol and all open source was banned from use. If that is what their boss says, then their job is to listen to their team moan and complain about it. Then tell them that is what they will be using. Since any technical objections the team gives are meaningless to them.

On the plus side they do not micro-manage and they appreciate their team members skills, and are good at bringing those skills out.

The middle manager likes people and wants to please them. But knows that their job is only to please their superiors. From their team they just need compliance, and being good at their job is all about bringing their team onside, with whatever the higher ups require. However outlandish.

Most likely to say: Sorry it wasn't my choice, but come on, we need to get on with it.

Catchphrase: I raised your issue with senior management, but no go. 😢

5. The Invisible Man

The invisible man works in a big organisation and knows the value of his super power. He used to do a bit of useful work, but that was years ago, when he still took enough interest in his job to get to the bottom rung of the management ladder. But over the years he realised he could get away with doing less and less actual work. He mastered quiet quitting years ago, before it became a thing.

Since then he (and his boss) has worked out that if he gets a team of reasonably senior self starter engineers, then they don't actually like or want to be managed. So a difficult opinionated team for some managers, are actually perfect for the invisible man. Ideally if they are a distributed team, then he can "manage" them remotely, with the minimum level of work. Send the odd email maybe, do one or two Zoom calls a week and his work is done.

His team may not respect him, they may even play jokes on him. But he so doesn't give a crap about work that he won't even notice them doing it. As a manager the invisible man is the mid-point of the top 10.

He is neither a good or bad manager, he is like having no manager at all. He will never support, challenge or rebuke you. At least nobody every quits their job because they have the invisible man as their manager.

Most likely to say: Sorry just had to step out for a minute.

Catchphrase: Keep up the good work guys. 👻

6. The Over Employed

The over employed is a people pleaser. They like to say yes to everyone, including their managers and their team. They like to be seen dashing about doing things. So sure they will do that for you tomorrow ... but tomorrow never comes! Because the over employed is too busy. They may even have got themselves a second job on the sly, thinking they can juggle both at once. They are such an optimist, of course it will work out.

They carry on saying yes to all those tasks that you need them to get done, to unblock your work. Just as they do to everybody else. So sure, they will sort out your performance review. Talk to the other manager that you need info from. But somehow they never seem able to deliver on time, if at all.

They will be there at 3am the night before a major deadline, chucking something together that is not quite finished and misses some vital component. But it is good enough, should do what is needed.

Poor people pleaser working their arse off to please people, so why is nobody that pleased with them?
But they stay cheerful, not going to let those moaners drag them down.
Oh well if colleagues get too annoying they can always bail out. Get a job somewhere else and leave behind all those trivial little tasks that people kept bugging them with.

Most likely to say: Yes sure, I will do that.

Catchphrase: Hiya guys, why the long faces? 🤯

7. The Team Player

For the engineers in their team the Team Player is one of the best bosses they have had. They always have their back, supports them and persuades those above him to funnel more resources and authority to their team.

They are ambitious and aiming to rise higher up the management tree, but loyal to their guys. They know their team is really the only one in engineering that is run properly. It is also the one doing the work that matters. They makes sure they sweet talks those above them and dedicates a reasonable portion of their time to making sure they know that they and their team are the keystone keeping the organisation running.

They know that anybody looking to advance their career should be spending a good portion of each working day working on their own personal advancement. Don't be the fool who spends all day every day just doing the organisation's work.

Unfortunately their highly competitive nature and self belief, can lead to self delusion. They start to believe their own self promotional narrative. This leads them to be contemptuous of those annoying flakes in other teams who are not doing anything of real value. Though generally energetic and positive those outside their circle and below their grade, get the aggressive, bullying, dismissive and unpleasant side.

This behaviour also distorts the true importance and funding that the organisation should be devoting to their team's remit, to the detriment of other areas. So can cause problems for the company as a whole, as well as for morale outside of their team.

Most likely to say: My team has got that, we will save the day.

Catchphrase: Get out of my way, unlike you, I have a real job to do. 💪

8. The Bureaucrat

Once upon a time, long long ago. Some software companies decided they wanted to sell their wares to the ancient hierarchical institutions of government and learning. Those institutions believed in traditions and rules and processes and making things quantifiable. So something that did that for software management was perfect. So the companies came up with traditional names signifying regal wealth and power - Prince. Along with naming their software, Enterprise software, signifying new, wealth generating and boldly taking the initiative. That was in the 1980s.

It was the perfect sales pitch for these outdated institutions and they bought into it wholesale. Although it took them about 25 years to get around to it, old institutions are like that.

Their procurement processes and software and its life cycle and management were bound into reams and reams of bureaucratic processes. The IT managers in those institutions were groomed in the ways of Prince 2 and ERP and ITIL and all the rest of the snake oil the companies had invented. They devoted all their money and time to the training and meetings and processes around it.

As far as the engineers in those institutions were concerned, a few of the processes were useful but that was far out weighed by the whole bureaucratic burden and costs they were wrapped in.

The manager spoke at length for years at far too many meetings of the process and the newly procured systems, but unfortunately the quality and features of the institutions systems seemed to have gotten worse. Whilst the cost of them became much much greater.
The institution employed more and more managers, although they just managed projects not people. But eventually there was so many, they needed managers for them too.
But they hired no more engineers.
Eventually the institution decided it didn't really need any in house software engineers at all, why were they writing software when they should be buying proper Enterprise software from the companies?

The engineer realised they had to leave the kingdom of the Prince 2 and its manager and go to live in a different place altogether, where their business was making software. Strangely in the software republic they had never even heard of Prince 2. They vaguely knew of PMP, but nobody in their right mind would use it to make or manage software.

Most likely to say: I am sorry I cannot talk to you, until you have filed a change request.

Catchphrase: Our KPIs show that we are on course for all our CSFs 👑

9. The Technical Superior

The technical superior is at least a grade or two above you and always interacts with you as your superior.

They were an engineer and still secretly preferred being an engineer to their current job. They were never the best technical engineer in a team, so they compensated for that by imagining they were the best at seeing the big picture engineering wise and still are. So they decided to become a manager to make sure the right technical decisions are made.

They probably preferred their old job because they didn't have to spend so much time on politics and relating to people. They have been a full time manager for at least 5 years and their technical knowledge and judgement have dated badly. However they still see themselves as the person most qualified to make technical decisions. The more senior they become, the more out of date their technical knowledge becomes, yet the bigger and more expensive are the technical choices they make for their organisation.

Most likely to say: Never ask a bunch of developers to decide on technology, ask 5 and you get 5 different answers.
Catchphrase: We really need a big monolithic Java XML SOAP web service to do that. 🙈

10. The Rockstar Techie

The very worst engineering manager is the rockstar techie.

Rockstar techies have great technical skills and may have technically saved the day a few times for senior management, so their lack of personal skills are overlooked. Plus they tend to behave much better towards people above them in the management hierarchy anyway.

But in the long term they are damaging to the quality of your codebase, the more senior they become the more damage they do. The common technical issues they can cause are blocking devolution of architectural decisions and diverse input into them. Possessiveness over code, or technical knowledge. Wanting to be the saviour for technical problems, outages etc.

However the damage they do to the code is minor compared to the huge damage they do to the engineering team and culture if they are put in a management position.

They were often the most senior technical engineer in their part of the company. They have probably been there a while and to justify getting another pay rise they got lumped with doing management too. They regard management as an annoying burden tacked on to the side of their real engineering job. They aim to remain the lead engineer in the team and make all the final technical decisions. They cannot devolve technical authority and have no interest in picking up any management skills. Leading to exhibiting basic level fails in terms of interpersonal skills, have a technical superiority complex, be rude, moody, bullying, micro manage etc.

As time goes on they must devote more and more time to management and yet cannot accept no longer being the most technically adept guy in the room. A paradox that can only be solved by driving away any members of staff who challenge them technically and effectively dumbing down the technical skills of the whole team.

Most like to say: You Fffing broke it you moron, when coming across a feature change implemented in a way they didn't expect.

Catchphrase: You are wrong, this is how it must be done, idiot 👹

(Any resemblance to persons living or dead is purely coincidental)

Tech sector lay offs

2023-01-27T14:34:00.015+00:00

INTRO: Having failed to post any technical articles for a few years I feel that my blog is at risk of dying from lack of attention. So to avoid that, I have decided to mix up its content a bit and diversify from long Technical HOWTOs to more casual short posts whose tech content may vary (or not exist at all) ... so to kick off this one is a short rant about tech news!

Fake News about Tech Industry Collapse

A number of bigger tech companies are laying off staff at the moment. The press reports this as a related to them making terrible errors in misreading trends post pandemic and suddenly hiring way more staff than they normally would over the last year, 2022. Now reality has hit and the tech sector is awash with newly redundant workers as big tech desperately tightens its belt to survive. But is there any truth in either the premise of this argument or its conclusions?

A random recent example that repeats these ubiquitous assumptions ... https://fortune.com/2023/01/23/big-tech-layoffs-15-20-percent-next-six-months-top-analyst-says/ ... Citing the not for profit basket case, Twitter, being turned into a uniquely loss making zombie company by Elon Musk as though Google, Amazon and Microsoft had something in common with it as a business!

If you look at the graph of the employee count of these companies year on year. Then only Microsoft hired more than usual last year, 2022, and Amazon to cope with its pandemic boom did so in 2020. Google last had an uncharacteristic hiring spree in 2012. Similarly Facebook and other big tech companies growth curve exactly followed that of the last decade in recent years. There was no extra hiring.

So why the layoffs. Nothing to do with over hiring. Simple - look at the share price curves instead.
The market is valuing most big tech lower and recession looms - they need to chop staff to chop costs and make their finances look better to reduce that drop for their shareholders - the biggest of whom are the CEOs of those companies.

The big companies are still making billions in profit (they are not loss makers) so over the long term it would cost them less to retain talent, and they can afford to. However the CEO's personal short term loss in wealth is something that they can't stomach, and it is a good excuse for a clear out.

Obviously there are large loss makers, the most prominent of which is Twitter, but they are special cases with failing business models - Twitter only ever made a profit in the run up to Trump's election as it became a huge engine churning up ideological conflict with political and conspiracy fictions. Without a politically polarised USA to drive an explosion of lies and social media wars, it was always a loss maker. It has nothing to do with big tech trends.

Apple is a real exception to the trend, currently, but only because its share price hasn't dropped significantly yet. Hence it hasn't done its layoffs yet.

So the real reason big profitable tech is laying people off is a temporary fix to save a few billions from the bear market's current swipe at the personal wealth of their CEOs. Even though being a slave to short term market driven fire / rehire cycles will cost the company more in the long term. It is purely a personal choice to save personal wealth, there was no over hiring, there is no need for redundancy, there is no down turn in the growth demand for the tech sector, there is just less cheap loans around to fund it.

The jump in the cost of borrowing is due to what looks to be a short term hike in interest rates, nothing to do with the tech sector itself. Largely due to Putin starting a war of revenge against his long dead ghosts from 35 years ago, when the Soviet Union fell as he presided over the KGB. A war against ghosts can never be won ... but unfortunately its far more terrible human cost will carry on for years.

This becomes clear on the other side of the market coin, late stage startups are often making no profit at all - because all profits along with loans are ploughed into growth. Because they must never be seen to be shrinking - to keep growing their valuation for IPO.

So a lot of software companies are hiring even though strictly they don't need to, whilst the big boys are firing when they don't need to.

Meanwhile the number of software jobs as a whole keeps growing at 10% year on year. The demand continues to outstrip supply and wage inflation follows. So if you are a talented (if overpaid) software engineer ... I wouldn't worry too much about the layoffs. It is just a chance to take a redundancy bonus and a 20% pay rise to try something new. Unfortunately for those that were used as a disposable foreign human resource by big tech, via job dependent visas, it is a different story. They may well not have time to stop their CEO's thoughtless greed needlessly disrupting their lives.

Many predictions are that interest rates will drop and the bear market end in a year's time. At which point the mass layoffs will be reversed. But the big tech companies will have lost a lot of money, and a great deal more trust, by following the market and each other so closely. The lesson employees will have learnt is that loyalty to such companies will never be rewarded or returned.

K8s Golang testing - Mocks, Fakes & Emulators

2020-02-13T17:41:00.003+00:00

A lot of the Go code I write is developed against Google's Kubernetes API.
The API is fairly large and given that the code is mostly calling K8s then it inherently has a set of complex dependencies, these dependencies have time and costs associated to run up for real as K8s clusters in cloud providers data centres.

So how can we test K8s Go code ... or any Go code with significant dependencies. We must use substitute objects that simulate the dependency objects. There are three common terms for these substitutes known collectively as test doubles. They are mocks, stubs and fakes. Unfortunately these terms are all pretty interchangeable. So before I start bandying them about, I had better define what I mean by these and related terms for this blog post ...

Stub = A function, method or routine that doesn't do anything other than provide a shim interface. If a stub returns values then they are dummy values (possibly dependent on calling args, or call sequence) that are either fixed or generated for a fixed range.

Mock = An object which replicates all or part of the interface of another object, using stub methods.

Fake = An object that replicates all or part of the interface of another object, and has methods which are not all stubs, so some methods perform actions that simulate the actions of the real object's method.

Emulator (full fake) = A package that has a significant amount of faked methods (rather than stubbed ones). For example a database server will normally provide an in memory database configuration that will completely replicate the core functionality of the database but not persist anything after the test suite is torn down.
Normally an emulator is not part of the code base and may require service setup and teardown via the test harness. As such, use of emulators tends to be for integration tests rather than unit tests.

Spy = A stub, mock or fake that records any calling arguments made to it. This allows for subsequent test assertions to be made about the recorded calls.

K8s Go Unit Testing

Given that unit testing by definition should not need any dependencies, then the assumption might be that for dependency heavy code, most unit tests will require test doubles ... the follow on assumption is that double == mock.

Mocks

Hence a standard approach to this is to use one of the Go's many generic mocking libraries, such as https://github.com/vektra/mockery, https://github.com/stretchr/testify or https://github.com/golang/mock.

There are numerous tutorials and explanations available to get Gophers started with them, for example this walk through of testify and mock.

These mock tools all offer test generation from introspection of your API calls to the dependency, to reduce the maintenance overhead.

So for everything we write tests for we generate mocks that reflect our codes API and confirm that it works as we expect. However there is a problem with this, the problem is described in detail in this blog post by Seth Ammons or in summary by Testing on the Toilet

The issues with mocks are:

Mocking your code's calls to an API only models your usage and assumptions about it, it doesn't model the dependency directly. It makes your tests more brittle and subject to change when you update the code.
Mocks have no knowledge of the dependencies they are mocking so for example as Google's APIs change - your real code will fail, but your mocked tests will still pass.
Mocks may use call sequenced response values, so making them procedurally fragile, ie changing the order of your test calls can break mocked tests.
If you want to swap out one library with another for a component, then because your mocks are specifically validating that libraries API, your mocks of it will need to be regenerated or rewritten.

So what is the alternative...

Fakes

Refactor your code to be testable by using interfaces.

Break things down into simpler interfaces and create fakes that implement the minimum methods for testing purposes. Those methods should perform some of the business logic in a simulated manner for them to better test the code and relation between method calls than pure stubs would. Your model of the dependency is direct rather than based just on your calls of its API, so arguably easier to debug when that model and the dependency (and your evolving uses of it) diverge.

Use ready made Fakes

But back to K8s and Google APIs ... in some cases the Google component libraries already have fakes as part of the library. For example pubsub has pstest. So you can just add the methods required so that things work for your test. In which case faking can be simple ...

The client-go library has almost 20 fakes covering most of its components but the only other fakes already in the K8s go libs (that I could find!) are for pubsub and helm

cloud.google.com/go/pubsub/pstest/fake.go
k8s.io/helm/pkg/helm/fake.go
k8s.io/client-go/tools/record/fake.go
k8s.io/client-go/discovery/fake
k8s.io/client-go/kubernetes/typed/core/v1/fake
... etc

However there are also third party libs for fakes such as
https://github.com/fsouza/fake-gcs-server

Use custom built Fakes
If there is not an existing fake or it doesn't fake what you need, then for Google libs the APIs you need to replicate are not simple and you may want to simulate a number of methods for your tests. So manually creating the fake and maintaining its API against the real Google API becomes too much work, compared with autogenerating mocks.

Google have sensibly anticipated this and hence released google-cloud-go-testing

This package provides the full set of interfaces of the Google Cloud Client Libraries for Go
there is no need to generate mock partial interfaces or maintain your own fake versions of its APIs.

As an example it can be used to create a fake GCS service, where data is just written to memory (in the global bucketStore variable)

The test substitutes the FakeClient for the real storage client. In order for the code to accept the real or fake client as the same type the library provides an AdaptClient method so both conform to the storage interface (stiface).
c, err := storage.NewClient(ctx, option.WithCredentialsFile(apiCredsFilename))

client = stiface.AdaptClient(c)

K8s Go integration Testing

For integration tests you ideally want to use the real dependencies, but if they are too slow or costly then they may well be best replaced by emulators.

Using gcloud emulators

Google also provides a number of full emulators to cater for speedy local integration testing,
https://cloud.google.com/sdk/gcloud/reference/beta/emulators which cover bigtable, datastore, firestore and pubsub.
So as part of your integration test setup you can can fire up the datastore emulator for example

> export DATASTORE_EMULATOR_HOST=localhost:17067

> gcloud beta emulators datastore start --no-store-on-disk --consistency=1.0 
                                        --host-port localhost:17067 --project=my-project

The datastore client can then just be connected to the emulator for testing

client, err := datastore.NewClient(context.Background(), "my-project")

Using EnvTest and a local K8s API server

The EnvTest package creates a Kubernetes test environment that will start / stop the K8s control plane and install extension APIs. The K8s API server (and its etcd store) is by default a local emulator service (although it can also be pointed to a real K8s deployment for testing if desired).

EnvTest is wrapped up as part of Kubebuilder which is the primary SDK for rapidly building and publishing K8s APIs in Go.

EnvTest caters for testing complex Kubernetes API calls of the type that might be required for testing a K8s operator for example. Hence when generating code for building an operator, kubebuilder uses the controller-runtime in its boilerplate for running this up for a template integration test.

Summary

So if your K8s Go code is tested only by mocks for unit tests and running up a real Kubernetes cluster for integration tests, then maybe its time to re-evaluate your testing approach and start using the tools for fakes and emulators that are available. The only issue is that they are quite numerous with a mix of sources, so picking the right mix of Google internal lib, Google or 3rd party test package or custom built fakes and emulators becomes part of the task.

Teaching an old Pythonista new Gopher tricks

2019-09-01T15:35:00.001+01:00

I recently got a new job where I need to write a lot of Golang, so needed to learn it.
I figured that you don't really learn a language unless you try and write code that actually does something useful. However having been to a recent Golang meetup where someone had come to a similar conclusion, and had written a full emulator of the Gameboy in Go - I also figured I wanted to do something that was not quite so complex or low level ... ie hopefully, could be done in a week.

So I decided to take the plunge by creating an open source package that does the same job, as a Python one that I released many years ago called django-csvimport. A simple add-on for the Django ORM that caters for loading data to models from CSV files, with the option to generate the model code from scratch for a CSV file by checking the data fields and determining the data type for each column.

Also doing a task where I had solved the problems in another language would mean I could just focus on how Golang might approach the problem, not the problem itself. So this post is about the practical differences between writing a Python and Golang solution. As such it compares the languages as tools for a certain job, which I hope is complementary to the many posts that compare the languages themselves. Suffice is to say, they differ in many ways ... most significantly in static vs. dynamic typing ... whilst being most similar in regarding readable consistent simple syntax as paramount - where other languages have different priorities - hence for both auto-formatting code is good practise, with Go's builtin go format doing the job of Python's black or yapf.

So firstly Django is one of the leading full web frameworks for Python, so what is the equivalent for Go? Gorilla, Gin, Buffalo etc. there are plenty of frameworks but which is the leading one with an ORM? ... I tried out a couple but reading around it, it became apparent that if you choose to develop a web app in Go, then the majority of devs don't use a framework at all!, so already the differences in the languages was becoming apparent. Reasons? If you choose Go for creating a web app then performance may be a significant requirement, even micro frameworks can be slower than raw code. Go is a recent language and as such has lots of web related features built into the core already ... templating, etc. and even imports are url based so a web framework in Go gives you less than it does in Python.

So instead I checked out Go ORMs and decided to write an extension package for Gorm as one of the leading Go ORMs.

So ditching the Web Framework / UI integration features of django-csvimport as an unnecessary extra, then the problem just consists of two parts, creating ORM model definitions that create relational database tables and parsing the CSV files to import the data to those tables.

From this high level spec. the core functional components that compose the tool that we want to rebuild in Golang are:

CLI interface to take arguments specifying source files and actions to perform
An ORM to manage vendor independent database schema creation and population
Utility to inspect data sources and determine data types
Template tool to create ORM models (metaprogramming)
CSV parser to read in CSV files - ideally capable of handling various formats and poor or inconsistent formatting - ie real CSV files!

For all of these we would hope for language level packages are available to do the major lifting. Then the package can just knit them together into a CSV to relational database import utility.

So stepping through these and rating Go vs Python...

CLI framework (draw)

As a minimum, our task requires a command line utility to point to the CSV data files to be imported.

Django comes with a CLI framework in the form of management commands. For our Go CSV import, gormcsv, we just have the ORM so we could roll our own CLI handling, but in this case that is probably not a great idea, since like Python, Go has a dominant CLI framework - Cobra equates to Python's Click. It uses the Viper config framework which is like Python's core configparser lib with extras. Within the gormcsv module I created these CLI command go files as a cmd package via Cobra's autogenerate feature and used them to wrap the importcsv.go and inspectcsv.go source files in the importcsv package that do the real work.

ORM (draw)

Any language's leading ORM's should cope with the database management and data population tasks and GORM is functionally similar in its capabilities to the Django ORM

Data source introspection tool (Python win)

Messytables is a mature package designed for the task of scraping in data from various heterogenous third party sources - possibly of poor quality. As such it is one of the many utilities created around Python's well established role in the data analytics realm. Go has no such tool. There is no third party package to cater for inspecting, type checking and cleaning up data sources :-(

So we have to make our own much simpler data inspector that will hopefully cope Ok with the most common data types if they are reasonably consistently formatted.

Templating tool for creating models (Go win)

For GORM and Django the ORM models are implemented directly as classes in the language rather than using an intermediate DSL or XML etc. So to create models based on introspecting source data metaprogramming must be used to generate code.
Templates are available in the core of Go. Also given it is statically typed and has no generics, then for some problems that generics would solve, the best alternative is to use metaprogramming. Hence templated generation of Go code is a normal Go pattern. So arguably this is better (core) supported in Go than Python. For Python code generation is rarely needed, and my original django-csvimport implementation just uses string construction and didn't even employ one of Python's many add on template packages, eg. Django or Jinja2 templates (hmm needs a rewrite!)
Note that both languages have fully functional reflection / introspection libraries in the core.

CSV Parser (Python win)

Most important to this application is the quality of the CSV parser. This is where Go is sadly completely let down. Its CSV parser is frankly inadequate and can only cope with CSV that is strictly formatted according to RFC 4180.

To quote from Python's csv parser library ...

CSV format was used for many years prior to attempts to describe the format in a standardized way in RFC 4180. The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources.

TBH Python 3's CSV parser is itself significantly more strict about format than the old Python 2 one and so certain CSV files cannot be parsed that Python 2 happily dealt with - largely due to the switch to unicode resulting in more character encoding related critical fails. However the Go parser is a whole other level of strict and realistically it can probably handle less than 10% of the real world CSV source files out there that you might want to scrape data from, into a database. Whilst Python 3's can probably cope with over 80%

I also investigated third party Go librarys that cater for parsing a more realistic range of CSV formatting, but found none that did so.

Conclusion

So in conclusion, Python may not be a Gopher Snake but for this task it does rather eat Go for breakfast. There is no ready made third party package to deal with ingesting unknown or badly formatted data like Python's aptly named messytables. Golang may sometimes be used for writing performant concurrent data processing in data science ... but it isn't used for the scraping and cleaning data sources part of the job! However this is a minor issue compared to the major blocker of not having an existing library that can import real world (ie sloppy format) CSV files.

So I have written my Go package for pushing CSV files to databases, gormcsv, and due to Go's great concurrency features it could certainly beat django-csvimport hands down in speed terms where big data quantities of CSV sources need ingesting. But I have yet to release it. Because with such poor compatibility with real CSV files, there doesn't seem to be much point - however I will hopefully persist in finishing things off, probably as a less performant work around to pre-clean CSV files into strict RFC 4180 prior to parsing. Since implementing my own CSV parser from scratch for Go would likely break my original goal's of coming up with an open source project in the language that would take no longer than a week!

Oh and what do I think of Go? Well I like it, I most like the concept of classes just being data structs with bags of composed methods loosely coupled to them. I least like the error handling unseparated from normal code flow ... since it can lead to poor readability of code due to the excessive error boilerplate stuck within the program flow. It is my new favourite (statically typed) language ... but it hasn't replaced Python as my overall favourite.

Using Java Spring & MyBatis for dynamic schema integration

2015-08-02T17:45:00.003+01:00

Using Spring MyBatis for dynamic schema integration may perhaps be subtitled, "Eating soup with a fork" ... since that is how it has felt at times. However that may partially be due to my lack of familiarity with these tools. So this blog posting is about why I got that feeling, how I bent my fork to the purpose ... and if I missed some tricks in doing that. Please let me know by commenting below :-)

Object-relational mapping - a personal history

I have been developing relational database applications for many years so I have been through the various stages of database persistence approaches. Starting with fairly raw SQL development or at most just fixed DAO wrapping. Through SQL templating languages and mapping approaches to largely adopting the full object-relational mapping, ORM, approach for the past few years. I even tended to use the same pseudo ORM top level for NoSQL sources, although of course a lot of ORM features then become inapplicable, such as use of an ORM model's relational navigation (eg. person.department.head.surname)

Having said that the object-relational impedance mismatch can still bite. However good an ORM is there is some need for understanding of what its API simplifications and abstractions are doing. To avoid excessive / badly performant SQL. But given that caveat, the amount of time saved in development and maintenance by a well implemented full ORM inevitably saves money on all but the simplest persistence requirements. Similarly the automation involved in schema migration, constraint, indexing and other data modelling needs is invaluable. As long as one knows what it is doing under the hood and hence when that 10-20% of database development / customised data design is required. For example custom functions and triggers, data writable views that present as a table for the ORM, etc.

MyBatis - is it an ORM I see before me?

Recently I have started working with a team who use an older Java technology set of Spring and MyBatis. My current project uses the Java MyBatis SQL framework to integrate various relational database sources.
MyBatis badges itself as a persistence framework. So it is not really claiming to be an ORM. It started as an SQL templating engine and that is still where its core skills lie. But it now has many other features bolted on. Including meta programming to generate Java relational mappers from database sources.

So it does have many ORMish features. However using MyBatis has been rather a painful journey for me, since I have found that it fails to implement many ORM design pattern principles that my ORM habituation had come to expect. Also using MyBatis with Spring MVC means we havent got proper MVC since the model is not a model as it would be with Hibernate ... its a MyBatis Mapper.
This is not just a minor niggle, on first use a core ORM feature is gone. A Mapper is not an object model of a table's row of data, with transaction and session management under the hood. Instead it is just a convenient object to hook up standard data queries and updates to with the management of saves and synchronisation of actions on Mappers and results returned by them, entirely manual. So in that sense you have less than what some more basic DAO wrappers give you, ie. when you update a record then fetch the values from that record they will return the data written into and queried back from the database, hence an inserted saved model would automatically hold the new primary key(s).
With Mappers the data readable from the Mapper will just be the data set to be updated not the actual data held in the database as a result of that update. To find out the result of your query you need to ensure you get a fresh query result, by requerying or flushing session.

There is a work around for this common requirement, you can annotate an insert with a custom select query to tell MyBatis to re-query just for the key manually before or after doing the update. However I found this didn't work well for Oracle and was a problem anyway with what I required. Largely because I was using Generator based classes, so adding custom annotations to these would not have worked for me.

@SelectKey(statement="SELECT MY_SEQUENCE.NEXTVAL() FROM DUAL", keyProperty="nameId", before=true, resultType=int.class) 
int insertMyTable(Name name);

Similarly there is no lazy update concept, updates are only and always done when you specifically run them. In effect its like using an ORM but disabling all the relational modelling, data synchronisation machinery and transactional management. Mappers are instead just convenience wrappers for running SQL - MyBatis Generator may do the ORM process of creating Java classes that (via XML) map to database tables or views. But mapper instances are not ORM synchronised table row objects.

The normal object centric approach is also not available. Whilst you can generate code from your database, you cannot generate your database from code. So full persistence data cycle management is not available out of the box. By that I mean the use of test harness and deployment tools that destroy and recreate the full database, populate it from fixtures, run tests and drop it again. A migration tool that introspects databases and can migrate back and fortbr /br /br /h between schema versions to sync them with code versions or vice versa, ie code -> data, schema management, in addition to doing the MyBatis Generator data -> code direction. So the sort of thing that a full stack web framework like Django has with its ORM.

This was a blow for my particular project since full data cycle management is exactly what I required for developing a central data aggregation and integration database. I also required proper fixture management for testing and the same serialized data input / output components of this being used for the main work of data aggregation from data sources in serialized format (XML, CSV etc.)
Finally the system was dealing with pulling in and delivering back out complex, evolving schemas from source to consumer databases. The one thing that would be essential is that all the schema handling and data typing be dynamic so that the code can handle this change rate without constantly breaking and having to be rewritten. So hard coded schema details and static typing must be avoided for any chance of maintainability. But it was to be written in a statically typed language. Again the tool was not suited to the job. So how did I bend this seemingly unsuitable platform to the task in hand.

Bending the fork

The first step was to tackle the dynamic schema vs. static typing issue. Java reflection is your friend here. Then reinvent some of the missing ORM wheels. Fixture loading, schema generation.
Along with working around the lack of object data synchronisation and automatic session management. Finally we wanted to make our data population speedy by only updating data that needed updates. So a hashing mechanism that could work with the serialized data was required.
Note that our in house Java standards decree that all configuration should be done via annotations rather than XML where available ... so that is pretty much everything in Spring ... aside from the core MyBatis Mappers XML files. The aggregation database was Oracle and various database's including MS SQL Server were the data sources.

Dynamic schema handling

Use MyBatis Generator, adding a config file for each database source and the target aggregation database.
Add the name of each table or view required for them to these config files ... eg.

<table tableName="PERSON" modelType="flat"></table>

ideally that would be all the schema customisation required as long as data source and target naming of tables and columns matched. However in a handful of cases mapping data (eg. tableA.colB = tableC.colD) was needed in the properties files.

I ran this as part of the Maven build at first, but it proved more manageable to be able to run code generation for the sources and target databases at different times, so I switched to wrapping this up in a command class for triggering independently via a batch job scheduler. In the same way as the data population commands were to be run. Schema -> code generation tended to be much faster than either data population or code -> schema generation, so freqprepreuent execution was not an issue.

The core class I wrote for enabling dynamic schema handling was called TableMeta and each table has an instance of this class listing all the metadata about its columns and primary keys etc.

This class was injected into a ModelFactoryService whose job is to return results or add, edit and delete models from any Mapper.
The Factory Service has a reflection based method for invoking the MyBatis generated Mapper methods, e.g. "SelectByExample" ...

  protected Object invokeMapperMethod(String methodName, Object parameter) {
    Class klass = exampleClass; 
    if (parameter != null) {
      klass = parameter.getClass();
    } else {
      parameter = newExample();
    }
    try {
      Method method = mapperClass.getDeclaredMethod(methodName, klass);
      try {

        // Run the method  
        return method.invoke(mapper, parameter);
      } catch (IllegalAccessException | IllegalArgumentException exRun) {
        Logger.getLogger("ModelFactoryService").log(
         Level.SEVERE, modelClass.getSimpleName() + "." + methodName, exRun);
      }
    } catch (NoSuchMethodException | SecurityException exCall) {
      Logger.getLogger("ModelFactoryService").log(
         Level.SEVERE, "Mapper class has no " + methodName + " method", exCall);
    }
    return 0;
 }

... similarly there were reflection based set and get column methods surfaced by whole Mapper data modification methods such as doUpdate(Object model)
The TableMeta caters for initial setup of the classes generated for that particular table by MyBatis Generator. Straight use of Class.forName(className) works with manipulation of the table name string copying that of the Generator's standard naming convention. The table's Mapper, Model and Example classes are then added as the template classes for the Factory. From that a Setters and Getters hash of the table's Mapper methods for each column can be built automatically.

Fixture loading

A sax parser based loader class reads in data fixtures in XML format. Each fixture row is parsed to a hash of tag name to value which can be passed via a SaverService to the ModelFactory.
An ObjectConvertor static class caters for de-serialising fixture data to the correct Java types looked up via the column name from TableMeta, and the Setter hash can then be used to update the matching named columns in the model.

Fixture dumping

For fixture dumping I cheated a little. Since I was only ever dumping from the target Oracle database then rather than reinventing a full serialisation of models (e.g. ORM with built in serialisation from any database to xml, json, yaml, csv etc.) I just provided a tool to do the minimum required for my testing and development needs - to just be able to serialise any table or view to XML from Oracle.
So in this case, to avoid the maintenance madness of a separate statically typed fixed schema solution for each table - I did the XML part in Oracle.
That way I could use a GenericMapper which generated two string fields based on a dynamic SQL query built from the TableMeta. The first being the row XML and the other the concatentated list of the serialised columns to be contained within it. MyBatis @SelectProvider annotation allows the gluing on of a method taken from another class to generate SQL ...

 @SelectProvider(type = TableDump.class, method = "getXMLProvider")

 List<GenericModel> getXML(@Param("meta") TableMeta meta, @Param("rows") final int rows);

The method glued on is this one which uses Oracle's native XML methods and the TableMeta's list of columns to generate a query that directly returns the XML..

 /**
   * Dynamic SQL generator method generates XML output for fixture as row and cols 
   */
  public String getXMLProvider(Map params) {
    final TableMeta meta = (TableMeta) params.get("meta");
    final int rows = (int) params.get("rows");

    String sql = new SQL() {
      {
        String rowXML = " '' || XMLElement(\"row\", XMLAttributes(";
        for (String col : meta.getPKeys()) {
          rowXML += col + " AS \"" + col.toLowerCase() + "\", ";
        }
        rowXML = rowXML.substring(0, rowXML.length() - 2) + ")) \"ROWKEYS\"";
        SELECT(rowXML);
        String queryXML = "''";
        col_loop:
        for (String col : meta.getCols()) {
          queryXML += " || XMLElement(\"" + col.toLowerCase() + "\", " + col + ")";
        }
        SELECT(queryXML + " \"ROWCOLS\"");
        FROM(meta.getTableName());
        for (String col : meta.getPKeys()) {
          ORDER_BY(col);
        }
        if (rows > 0) {
          WHERE("rownum <= " + rows);
        }
      }
    }.toString();
    return sql;
  }

This above snippet shows an example of MyBatis SQL templating - the core of MyBatis.
The result is a single Generic Mapper that by calling with the appropriate tables TableMeta can return an XML dump of any table in the target database. For test fixture generation this can be passed a result length ... since usually ten or twenty will be sufficient for integration testing.

Save methods

I won't go into the save methods in detail - suffice to say the data was hashed on the way into Oracle and then a query of primary keys to last modified hashed data returned of all rows to allow incremental update by hashing each of the XML data sources rows and checking it first to determine if an update or insert was required. Whilst this could not take advantage of databases or Java's hashing - since it had to be serialised data compatible - it did have a significant impact. Since the rate of incremental updates meant that we are getting a maximum of 1% data churn per table, so even with all the hashing and comparision overhead - an incremental update is still around 30 times faster.

For versioned data a core data table and a versioning table was required and this then needed both to get access to the new primary keys. As mentioned this is not straight forward in MyBatis - so the easiest solution was to have a bespoke version Mapper that just wrapped a single versioning sequence and could be used to update the related data and version tables by calling custom VersionMappers nextVal or currVal methods.

Code Schema Cycle

As mentioned MyBatis doesnt cover the code to schema half of the persistence cycle, so the option was to employ a separate full schema life cycle framework such as Liquibase. Or roll my own. In this case my requirements were not for extensive migration features. Since as an aggregation database the schema could be snapshotted dropped and rebuilt. So to avoid complexity and further dependencies I just added a tool to build Oracle schema from standard dumps of the schema from an Oracle client tool. So you just dump each object to a separate DDL SQL file, then it checks all the files works out which object type they relate to and loads them in a sequence which should prevent referential integrity clashes, ie.

DATABASE LINK, SEQUENCE, TABLE, MATERIALISED VIEW, FUNCTION, VIEW, INDEX, PROCEDURE, TRIGGER, CONSTRAINT, FOREIGN KEY

This is wrapped up as a DatabaseReset command that either drops all data or the full database and rebuilds the schema. So a run of reset followed by the GenerateMappers command gives a freshly built persistence layer and Model code to talk to it. It would be nice to have the schema built from the Mappers code and a more standard coherent code schema cycle, but given Mappers are in use, that would not be available even if Liquibase were in the mix.

Spring issues

So as a newbie to MyBatis I guess I had some problems with it. But as long as this slightly irascible post is. Guess what, I also had issues with Spring too ... so I will make it even longer by getting those off my chest too :-)

It seems to require huge amounts of configuration to do what you want. Again it can be made to do most things, but ... and maybe to some extent ... because of that ... it takes an awful lot of configuration to do some of the things a more opinionated full stack web framework would do out of the box. I guess I need to come across the sort of edge cases which are really difficult in a full stack framework, to appreciate Spring, since currently it feels like it needs a great deal of maintenance heavy tinkering to do some of the basics.

So the first surprise was that all my Spring @Service classes default to singletons. So in order to have for example a ModelFactory for the two different tables a versioned update required I needed to build a BeanFactory and make these beans and annotate them as prototypes - ie normal classes not singletons. I guess this is because all of this related to command classes not Spring MVC web classes ... which would have had web session scope.

  @Bean
  @Scope(value = ConfigurableBeanFactory.SCOPE_PROTOTYPE)
  public ModelFactoryService modelService() {
    return new ModelFactoryService(prop, hashService());
  }

So all my indvidually injected service classes tended to have to be uninjected and instead the bean factory injected in their place.

Along with that the database session to mapper class connection seemed rather frail.
The only way to ensure this worked was never to use a generic session handling method but always use separate prototype beans for separately named session classes annotated to find the correct mappers for that particular database ...

@Configuration
@MapperScan(basePackages = "uk.ac.bris.adt.erp.dataint.sources.mm", sqlSessionFactoryRef = "MMSessionFactory")
public class MMDBConfig extends DBConfigABC { ... }

... not only that the directory hierarchy matters due to the niceties of MyBatis-Spring , since the MapperScan will find anything at or below a directory ... so you cannot put mappers for one connection below the point you need to scan for another ... or they will be sucked up as Mappers for the wrong session.

Test configuration

Also the test configuration for integration tests required a lot of manual annotations to pick up appropriate configuration environment in order to inject things into the context in a working manner to match the running code configuration. Again I had been spoilt by expecting a framework specific automatically working test harness with default test customisations of the runtime code environment thrown in. Instead each integration test needed to use a wrapper that called a RunnerBeanConfig

@PropertySource("classpath:test.properties")
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(loader = AnnotationConfigContextLoader.class, classes = ITRunnerBeanConfig.class)
public class ModelFactoryServiceIT extends ServiceABC { ... }

The RunnerBeanConfig then needs to have all the database and resources configs annotated as Imports or it cannot find the different database sessions. Only after all that is it possible to inject the runtime command classes for testing.

@ComponentScan("uk.ac.bris.adt.erp.dataint")
@PropertySource("classpath:test.properties")
@Configuration
@Import({ DataIntDBConfig.class, MMDBConfig.class ... ResourceConfig.class })
public class ITRunnerBeanConfig implements EnvironmentAware {

  private Environment env;

  @Override
  public void setEnvironment(Environment environment) {
    this.env = environment;
  }
  ...
}

Conclusion

OK so I got it working in the end. However it all felt rather like I was doing something rather bespoke and complicated. This added a great deal of development time (or more accurately configuration wrangling time) to deliver components of persistence layer management and testing - that I had previously expected would all be available already in a mature high level framework. So in summary I probably have to conclude that if you needed to do this task in a more maintainable and standardised manner with a minimum of custom code. Don't use Spring and MyBatis. However if you already use one of those tools ... and you need to add this sort of functionality with them. Then it is certainly doable, and perhaps if you are a newbie to them, like me, this post may be of some use in speeding the job up for you :-}

Picking a language for shell scripting based on its framework ecosystem

2015-07-30T20:17:00.005+01:00

Do you work somewhere that has a lot of old shell scripts?
Were they all dashed off quickly according to the whims of their original creator - just deploy scripts so maybe they missed some of the care (and documentation) the application code had?
Whenever a major revision of said shell scripts is needed, do all of them end up in the bin - sorry trash in case you thought I meant the /usr/bin. Because its easier to write such platform and author specific scripts from scratch than maintain them?

Well the team I recently joined does. So as somebody who personally moved from shell scripts to a shell compatible object orientated scripting language some time ago. Then on to a shell framework, I was asked to assess what to move our menagerie of bash, zshell, cshell, powershell etc. scripts to.

My remit was to recommend a language instead of all those developer specific procedural shell scripts. After all the point of a mainstream language is to introduce cross platform code with a huge set of useful libraries, along with hopefully some standard approaches. Hence get above the sea of shells.

However my preference is to move the team to using a shell framework.

Because what a framework does is give you opinions. It chooses how to do stuff, so you ... and all the developers that come after you, don't have to. In my opinion most of the productivity comes from everyone becoming familiar with the same way of doing things ... whether or not its the best way
(and if it isn't the best way the framework developers spend a lot of time fixing that way with hopefully the minimum of disruption to its API for the end users).

I felt it would be useful to see how active the shell framework ecosystem of a language was, in order to assess how suited to shell scripting it is. Since the goal of a shell framework is essentially the same goal, standardisation of shell scripts. So one indicator for the most suited language is to list the mainstream shell frameworks for the contenders and see which comes out top.

It was decided we will choose from Javascript, Python or Perl. So this is for a team that almost exclusively use Java, which due to its pre-compiled statically typed nature is inherently poorly suited to shell scripting.
Unfortunately Ruby was not to be a candidate. However I have added it anyway, because if its good enough for Amazon, and leading configuration management frameworks such as Puppet ... it should be included.

Strangely it is difficult to find definitive lists of shell frameworks ... perhaps whilst I thought the term had been around for 10 years or more, it has not been coined very effectively ... its not even on Wikipedia ... maybe I made it up! Anyway they exist whether or not there is a proper name for them ... they also often cross over the grey area into being full blown configuration management tools.
So to avoid that argument ... I will add those in as well. I have marked entries with a * where I think it is more a config management tool than a shell framework.

For configuration management systems wikipedia does have an entry ... so using it as a similar benchmark it gives a relative language score of:

Python 11, C 5, Perl 5, Java 4, Ruby 3,
and 1 each for C++, Scala, Erlang, Go and PHP (none for Javascript)

So on the same sort of basis I have ranked the languages with the most healthy shell framework ecosystem ... in the belief that it is a good indication that they may be the most suited to being used for shell scripting ...

Python 8

Python has the largest number of well established shell frameworks, as it does in terms of configuration management systems. So on that basis I could award it first prize as the most used language for standardising shell scripting. However numbers are not everything. It doesn't necessarily have the leaders of those two categories of software (see Ruby below)

Ruby 6

Ruby has capistrano which is possibly the most popular of all shell frameworks, it also has two of the most popular configuration management systems, Puppet and Chef, so on that basis it should perhaps be first or at least joint first with Python. As the most appropriate language for modern shell scripting.

Perl 3

Perl has things that perhaps the other languages don't in related areas. So a number of shell implementations in Perl, and frameworks for writing custom shells. But in terms of what I mean by a shell framework - ie. a tool to run shell commands in a standard way across different servers, shells and platforms it is surprisingly lacking.

Javascript 2

Ummm well Javascript may have broken free of the browser with node.js and other server side runtime environments. But are there any shell frameworks or configuration management systems built with it? ie. tools designed for running shell commands and doing anything other than Javascript installation. Not much TBH and certainly no configuration management engine.

commander.js (clone of Ruby commander)
shelljs is the leading bunch of utilities for node.js that can act as a shell framework

Conclusion

Its a tie between Python and Ruby, so the decision as to what to go for depends on the individual tools that best suit your shell framework requirements, what they may need to integrate with wrt. configuration management and deployment tools ... plus the existing skills of your developers.
Oh and whether diversity is more important than being the market leader ... or vice versa.

For my own case, Ruby isn't in the running, so that makes it Python based on this premise.
I may follow up this blog post with one comparing the languages wrt. a sample shell script - and see which looks the most maintainable. As a second means of deciding between them.

Note: One other factor is related virtual frameworks ( ... yep made that term up too!)
Basically these wrap up configuration and deployment of virtualised software deployments - either hypervisors or containers - to give a full platform application build - usually for development purposes.
So here Ruby is well ahead with Vagrant along with box grinder

The 10 commandments of maintainable web services

2014-11-16T13:34:00.003+00:00

Here is a list of the ten core elements needed for a development to deployment phase infrastructure to provide a stable service for your web applications. Along with minimizing time wasted on bugs and issues that are unrelated to functional development, and slashing the maintenance time and cost - compared to systems without them. I guess it could also be called automation, automation, automation ...

It should be noted that just because an application is a legacy one. It does not mean that all of this infrastructure cannot be retro fitted to it. *

Standard environment
A set of consistently built and upgraded deployment phase environments - dev, demo, train, prod for the full application stack - e.g. app server, cache, web server and storage. All development and deployment is done on these entirely standard (ideally config management / virtualised) cloned environments. If random desktop / laptop computers must be used then ideally a virtual box build version should be provided for dev, to match the deployment ones.
For web applications the server side will be a single environment, but if client side software is involved it may require multiple standard environments for build and test.
Automated build
Run one command or press one button to create a full application stack instance on any of the deployment environments. Including production. So this should be everything above the standard environment and ideally include storage too (see data automation). Each developer can build numbers of deployment instances in the same automated fashion. Builds should be remotely runnable for plugging into Continuous Integration, C.I., servers etc.
Automated release management
Particularly important is that no manual tasks are needed to deploy to production. A push button C.I. driven deploy should be used where each deployment is retained in a full log accompanied by summary deployment note and related software packages release history and source tag. This full logging of changes ties into software service change management concepts. If unforeseen dependency system issues develop a lot later, they can then potentially be tied to the highly detailed timestamped change logging that this provides.
Automating the roll-out means that you should also automate the roll back. You hopefully will test well enough not to need a safety net, but not bothering to use one is reckless.
Another common loop hole is that release only covers the application layer. The standard environment, storage etc. are all part of the stack and changes in them are also releases, and need the same release management controls in place.
Revision control of the entire application stack
Everything in the application should be versioned. So all the source code of course. But also all the deployment and automation code. The third party components should all have their own versions (if not download, version and deploy from your own local repository). That includes the application specific environment configuration, eg. Apache virtual host configuration.
Build automation should allow specification of tags (or to a date or to any previous release - logged via C.I.)
The code dependency stack should also be versioned - so the versions of every component for a system release. Language specific build tools such as Maven, Pip, Ant, Phing, Bundler, Buildout etc. provide this. The standard environment(s) should also be versioned via their config management tool.
Integrated documentation
Core documentation should be written and versioned with the source code, each package should at least have README and a release HISTORY tied to each production releases version number. These need to be kept up to date with the rest of the source. Separate wikis for fuller / less technical docs are fine - but documentation of changes in functional specification need to use the same version control as the code - unless all your code has rigorous processes around a versioning integrated issue tracker - that is most reliably done by putting documentation in the code.
Ideally the language's packaging tools should have a system to extract embedded documentation and comments into HTML on a software repository server - for easy reference.
Automation to keep the web documentation up to date should be implemented.
Software upgrade process
Major version platform upgrades should always be performed within a year of release date. Not just for security patch reasons (these must be carried out within a month at most). Ideally the former within a few months and the latter within a few days. Any longer and code divergence can make the upgrade hill too big a cost to scale, or compromise systems data. Major language / framework (as well as releases) - should not require significant system outages. These may not be automated to set up, but they should be automated to roll over between upgrades - so if you are without a multi-server load balanced layer in part of your application stack - then downtime should still be under a minute at most, e.g. an Apache or Database restart.
Automated testing
They may not provide great coverage but a minimal test suite is a necessity to allow confirmation of success for the automation infrastructure.
Good test coverage means that complex functional errors or regressions can be written as tests and added to periodic builds - so ensuring that future releases are free of them - but a set of minimal functional or black box tests are sufficient to cover basic confirmation that automated environment upgrades, or minor application fix releases do not caused critical failures. These tests can also be tied to monitoring / timed load testing - to check upcoming releases for performance regressions.
Data automationThis involves data fixtures, automated schema generation and synchronisation.
With the advent of an object relational mapper (ORM) as standard in today's web applications. Then your system should have a full data abstraction layer, in even the most micro web application framework. In turn that means today's application code should contain within it the means to generate all of the data layer. Ideally ORM's should provide the means to abstract fully the database implementation, to generate that implementation within a range of RDBMS and to generate data fixtures for it, for building populated new development instances or for testing.
As standard the test harness will setup and teardown the data layers.
More mature ORMs will also have schema migration tools. These are essential for full automated release management, since invariably a significant release will involve a change to the data schema, or at least a new entry in the database. A synchronisation tool will tend to use meta-programming to automatically generate the migration code that synchronises the schema - that migration is then released (or rolled back) as part of the code release - keeping the data storage in the release management loop. Any data modification (DML) - that the application requires can be added to the DDL of the schema migration. These tools will also have introspection code to detect that data migration is required if connected to a previous version of the database. Bespoke applications may not have the tool, but at worst they should have data creation and migration code written and packaged with newly released versions - manual database tinkering around the same time as the code release, is not acceptable.
Package management
Application layer package management will always be language specific, but any language should offer it. Ideally a package repository should be maintained for each language your services use. These may be core to the language like PyPi and RubyGems or for languages without them in the core there are commercial offerings like Nexus for Java.
This caters for version dependency management and reliable upgrade. Of course to use a package manager fully, you should package all your application source code. Ad hoc scripts or framework app archives, raw class and resource bundles etc. - Just say no. If you are going to release your code rather than chuck it over the wall ... package it and version it. So all your code should be in jars, eggs, gems - or whatever your language likes to call them.
Not only that you should apply the same rules to splitting up packages as you apply to splitting up code into classes. Some packages may be dependent on others - but each separate component of the application should be a different package - to allow it to be separately version controlled and released. To encourage encapsulation and hence allow for packages to be reused, retired or replaced without replacing the whole application's code base.
(NB: Environment package management will be operating system specific and that should be implemented as part of the standard environment config management layer - no building from source here!)
Monitoring
One of the most important issues with logging and error notification is the cry wolf factor. You need to ensure that you draw the line in the right place for what are critical errors - ie. those that generate notifications to people. You can have over reporting initially if it makes you hammer down on all those bugs to get a reasonable level. But the one thing that makes monitoring ineffective, is over reporting, if a system is emailing you a hundred stack traces a day, and have been for the last month - or the critical log is equally verbose - you filter the emails and ignore the log. You need critical bug notifications to be rare enough that you jump straight on fixing them when they are sent. Of course don't over do it, ideally you shouldn't ever be in the position when the only reason you know that a service is down is because an end user has phoned up to tell you. If your monitoring is good enough it will always do that, for all but the most involved functional errors.
You also need standard uptime monitoring such as Nagios or the like to notify if services have failed completely (unable to send application layer errors) for each of the layers - web, storage, cache, environment.
Plus load logging for each, response time logging, etc. Most importantly you need to retain the logging over time and hence be able to look back at problems vs. change management data (see automated release management) to be able to diagnose many service issues and ideally predict and forestall them.

Walk the walk

So do I have the ten commandments in place for all our production systems in my current work place? In part, we have for all our Python Django web applications (although some are a bit sparse in places - eg. monitoring, release management below the application layer). But our Java architecture only has packaged components, although work is being done for new Java Spring systems to provide automated build, ideally some tests and the need for monitoring is recognized. Hopefully we will tick all ten boxes for it too, eventually. So we will have as solidly maintainable a Java Spring platform as we have with our Python Django infrastructure.

However the concern is perhaps as much with all our legacy or outsourced systems integration code. These have none of these components and no realistic likelihood of getting them. Hence there is a huge support burden that results, diverting time away from providing them and leading to unreliable services. Add to that the problem of how platforms can be frozen, whilst still in use, as with our legacy Python (zope) architecture and then rot and lose the maintenance infrastructure that they had, (Our old CMS went live with half of the above features - now it has none) and the picture becomes a little bleak. Here the answer is perhaps to start to implement much more hard nosed rules wrt. to retiring systems, if they have replacements, whether or not those replacements fully cover the same functional space. Essentially this is a management, not a technical issue.

With a much reduced set of critical legacy systems and appropriate resourcing it would be possible to retrograde add the commandments to them, and bring all services up to a similar quality control.

However the problem is greatly exacerbated by 'new' legacy bought in systems. So by this I mean third party supplier systems that we run and have to maintain (eg. regular upgrades, performance monitor etc.) that do not have most of the above features. Unfortunately something that appears true of all the smaller supplier's systems procured recently - ie. companies with under 10 core developers. Perhaps because most of them are providing products that actually are legacy ie. have not been written, or fully rewritten, in the last 6 years (for the full rant on this topic see the ten commandments of software procurement!)

* Fixing the legacy and external systems

There are plenty of configuration management and shell framework tools that can be applied to automate even the messiest old legacy systems. The key rule here is you don't need to write any of the infrastructure in the legacy code base. So use your standard CI server, shell framework and config management tools - don't add more procedural platform specific code (e.g. raw shell scripts).
Modern automation tools should all be pretty platform independent - although if running Windows and Unix you may be better using a different shell framework for each, eg. Fabric and PowerShell, possibly the same for config management tools.

If the code contains closed source compiled components with no versioning. Then the binaries can still be put into version control and release numbers assigned. At worst decompilation tools can be used - if there no other reasonable way to fix or replace the components.

Similarly black box testing tools can be applied to any software, and if none of the technical team know what that code is doing - end users can provide a basic functional spec of what its meant to do, and these few basic stories used to create some minimal BDD tests.
Data in / data out dumps and comparisons can also be used as a basis for manually maintained fixtures. Legacy components can be split up and packaging added to them ... but then much more work along this line of legacy code re-factoring and we start to raise the question of respecify / rewrite / replace being more cost effective.

Fixing third party Django packages for Python 3

2014-10-29T21:36:00.002+00:00

With the release of Django 1.7 it could be argued that the balance has finally tipped towards Python 3 being its preferred platform. Well given Python 2.7 is the last 2.* then its probably time we all thought about moving to Python 3 for our Django deployments.

Problem is those pesky third party package developers, because unless you are determined wheel reinventor (unlikely if you use Django!) - you are bound to have a range of third party eggs in your Django sites. As one of those pesky third party developers myself, it is about time I added Python 3 compatibility to my Django open source packages.

There are a number of resources related to porting Python from 2 to 3, including specifically for Django, but hopefully this post may still prove useful as a summarised approach for doing it for your Django projects or third party packages. Hopefully it isn't too much work and if you have been writing Python as long as me, it may also get you out of any legacy syntax habits you have.

So lets get started, first thing is to set up Django 1.7 with Python 3
For repeatable builds we want pip and virtualenv - if they are not there.
For a linux platform such as Ubuntu you will have python3 installed as standard (although not yet the default python) so if you just add pip3 that lets you add the rest ...

Install Python 3 and Django for testing

sudo apt-get install python3-pip
(OR sudo easy_install3 pip)
sudo pip3 install virtualenv

So now you can run virtualenv with python3 in addition to the default python (2.*)


virtualenv --python=python3 myenv3

cd myenv3

bin/pip install django

Then add a src directory for putting the egg in you want to make compatible with Python 3 so an example from git (of course you can do this as one pip line if the source is in git)

mkdir src

git clone https://github.com/django-pesky src/django-pesky 

bin/pip install -e src/django-pesky

Then run the django-pesky tests (assuming nobody uses an egg without any tests!)
so the command to run pesky's test may be something like the following ...


bin/django-admin.py test pesky.tests --settings=pesky.settings

One rather disconcerting thing that you will notice with tests is that the default assertEqual message is truncated in Python 3 where it wasn't in Python 2 with a count of the missing characters in square brackets, eg.

AssertionError: Lists differ: ['Failed to open file /home/jango/myenv/sr[85 chars]tem'] != []

Common Python 2 to Python 3 errors

And wait for those errors. The most common ones are:

print statement without brackets
except Error as err (NOT except Error, err)
File open and file methods differ.
Text files require better quality encoding - so more files default to bytes because strings in Python 3 are all stored in unicode
(On the down side this may need more work for initial encoding clean up *,
but on the plus side functional errors due to bad encoding are less likely to occur)
There is no unicode() method in Python 3 since all strings are now unicode - ie. its become str() and hence strings no longer need the u'string' marker
Since unicode is not available as a method, it is not used for Django models default representation. Hence just using
def __str__(self):
return self.name
is the future proofed method. I actually found that models with __unicode__ and __str__ methods may not return any representation, rather than the __str__ one being used, as one might assume, in Django 1.7 and Python 3
dictionary has_key has gone, must use in (if key in dict)

* I found more raw strings were treated as bytes by Python 3 and these then required raw_string.decode(charset) to avoid them going into the database string (eg. varchar) fields as pseudo-bytes, ie. strings that held 'élément' as '\xc3\xa9l\xc3\xa9ment' rather than bytes, ie. b'\xc3\xa9l\xc3\xa9ment'

Ideally you will want to maintain one version but keep it compatible with Python 2 and 3,
since this is both less work and gets you into the habit of writing transitional Python :-)

Test the same code against Python 2 and 3

So to do that you want to be running your tests with builds in both Pythons.
So repeat the above but with virtualenv --python=python2 myenv2
and just symlink the src/django-pesky to the Python 2 src folder.

Now you can run the tests for both versions against the same egg code -
and make sure when you fix for 3 you don't break for 2.

For current Django 1.7 you would just need to support the latest Python 2.7
and so the above changes are all compatible except for use of unicode() and how you call open().

Version specific code

However in some cases you may need to write code that is specific to 2 or 3.
If that occurs you can either use the approach of latest or anything else (cross fingers)


try:
    latest version compatible code (e.g. Python 3 - Django 1.7)


except:
    older version compatible code (e.g. Python 2 - Django < 1.7)

Or you can use specific version targetting ...


import sys, django


django_version = django.get_version().split('.')





if sys.version_info['major'] == 3 and django_version[1] == 7:
    latest version


elif sys.version_info['major'] == 2 and django_version[1] == 6:
    older django version


else:
    older version

where ...

django.get_version() -> '1.6' or '1.7.1'
sys.version_info() -> {'major':3, 'minor':4, 'micro':0, 'releaselevel':'final', 'serial':0}

Summary

So how did I get on with my first egg, django-csvimport ? ... it actually proved quite time consuming since the csv.reader library was far more sensitive to bad character encoding in Python 3 and so a more thorough manual alternative had to be implemented for those important edge cases - which the tests are aimed to cover. After all if a CSV file is really well encoded and you already have a model for it - it hardly needs a pesky third party egg for CSV imports - just a few django shell lines using the csv library will do the job.

Spring MVC setup on Ubuntu

2014-07-03T09:38:00.001+01:00

Recently setting up Spring MVC on Ubuntu 14 with Netbeans wasn't entirely obvious for a newbie, so I thought I would document it in case it saved somebody 10 minutes!

First install Apache and Tomcat, if you haven't got them already...

sudo apt-get install apache2

sudo apt-get install tomcat7 tomcat7-docs tomcat7-admin tomcat7-examples

You should also have the default openjdk for tomcat and ant build tool and git

sudo apt-get install default-jdk ant git

Edit the tomcat-users.xml netbeans requires a user with the manager-script role
(NOTE: you shouldn't give the same user all these roles in a production Tomcat!
Also note that these manager roles have changed from Tomcat 6)

sudo emacs /etc/tomcat7/tomcat-user.xml

<tomcat-users>
  <role rolename="manager-gui"/>
  <role rolename="manager-script"/>
  <role rolename="manager-jmx"/>
  <role rolename="manager-status"/>
  <role rolename="admin-gui"/>
  <role rolename="admin-script"/>
  <user username="admin" password="admin" roles="manager-gui,manager-script,manager-jmx,manager-status,admin-gui,admin-script"/>
</tomcat-users>

Should restart tomcat after editing this ...

sudo service tomcat7 restart

Now you should be able to go to http://localhost:8080 and see


It works !
If you're seeing this page via a web browser, it means you've setup Tomcat successfully. Congratulations! ...

Click on the link to the manager and get the management screen

If the login fails - reinstall apache and tomcat - it worked for me!

For Netbeans to find Apache OK you have to put the config directory where it expects it ...

sudo ln -s /etc/tomcat7/ /usr/share/tomcat7/conf

Note that the tomcat location, ie for the deploy directory is in

/usr/var/lib/tomcat7

Now install Netbeans, latest version is 8, either by download and install or

sudo apt-get install netbeans

Start up netbeans and go to Tools > Plugins

Pick the Available plugins tab

Search for web and tick Spring MVC - plus any others you fancy!

Restart Netbeans

Add a new project

Choose New Project (Ctrl-Shift-N; ⌘-Shift-N on Mac) from the IDE's File menu. Select the Java Web category, then under Projects select Web Application. Click Next.
In Project Name, type in HelloSpring. Click Next.
Click the Add... button next to the server drop down
Select the Apache Tomcat or TomEE server in the Server list, click Next

Enter Server Location: /usr/share/tomcat7

Enter the username and password from your tomcat-users.xml above and untick the create user box, if everything is working then it will accept this and add Tomcat to your server drop down list

(it shouldn't need to try to add the user unless that user isn't already properly set up with the manager-script role in Tomcat)
In Step 4, the Frameworks panel, select Spring Web MVC.
Select Spring Framework 3.x in the Spring Library drop-down list.

Click Finish and you should have a skeleton Spring MVC project, pressing the Play button should build it and run it up, then launch your chosen browser with the home page of that project via the Apache Tomcat you have setup.
Any changes should get auto-deployed and popped up in the browser again by pressing play.

Lessons learned from setting up a website on Amazon EC2

2014-05-02T12:14:00.002+01:00

I recently got involved with helping someone sort out their website on an Amazon EC2 instance, it had been a few years since I had the need to do anything with EC2, I realised that I was a novice in this world - and it raised a number of issues related to deploying to EC2 and performance.

So I thought it may be useful to run through them for any other EC2 novices who are asked to do something similar, and want to learn from my rather blundering progress through this :-)

Apologies for those of you are already well familiar with EC2 for covering some of the basics.

The system moodpin.co.uk was based on a commercial PHP application, Pintastic.
So this allows you to set up a site like pinterest.com or wanelo.com
These sort of sites are for creating subject specific photo sharing social media systems, so like Instagram, Picassa etc. but focussed around communities of shared (usually commerical) interest. For example buying shoes, interior decor etc.
The common UI that they tend to present are big scrolling pages of submitted images related to topics for sharing, comment and discussion.

So this system sends out a lot of notification emails, involves displaying hundreds of images per page - the visual pin board - and to help with performance has custom caching built in - triggered by cron jobs.

Hence we have a number of cron jobs with the caching ones running every couple of minutes. To me this appeared a pretty crude caching mechanism - but my job was not to rewrite the application, but just tweak the code and get it all running OK.
The code mainly uses a standard MVC approach like everything else these days!

So demonstrating how outdated my knowledge of EC2 or this application were. I thought OK - first of all what platform is it. It was Amazon's own Linux - this uses yum rather than apt for package installs so as distros go its perhaps more Redhat-like than Debian.

For those unfamiliar with the basics - go to Amazon web services and sign up!
You can then choose to add some of the 40 odd different services that are available under the EC2 umbrella.

Once you have signed up to a few of these, you get a management console that links to a control dashboard for each service. The first step usually being - the one with the computer instances on, EC2. From there you can pick an AMI (ie. operating system image), a zone - eg. US West (Oregon) and use it to create a new instance. Add an SSH key pair for shell access and then fire it up and download the pem file so you can ssh into your new Amazon box.

So the client wanted the usual little tweaks to PHP code, CSS tweaking - so easy stuff its just web development ... done in a jiffy (well after digging through the MVC layers, templating language, cache issues and CSS inheritance etc. for a fairly complex PHP app you have never come across before, when PHP is not exactly your favourite language ... jiffyish maybe)
Then we got to the more SysAdmin related requests ... lets just say I probably shouldn't rush out and buy a DevOps tee-shirt just yet ...

'Get email working'

Try to send an email from the web application - write a plain PHP script that just sends a test email - just run mail from the linux command line ... Got it there is no MTA installed!
Install an MTA - sendmail. Go back up that stack of actions and they are all working ... hurray that was easy.
A week or so later ... 'emails stopped working'
Go back to step 1. and yep - emails stopped working
Look at the mail logs and see what the problem is.
Realise that there are masses of emails being sent out ... but all of it is bouncing back as unverified.
Think ... wow that pintastic site's notifier is busy - must be getting lots of traffic *
So why has Amazon started bouncing all the email?
Search Amazon's docs. Amazon has a very minimal test quota allowed for email. Once that quota is filled, unverified email will be blocked.
Amazon has historically been one of the main sources of SPAM machines, that history means that it has to set up a much more elaborate mechanism for validating email that most hosting companies, and it no longer allows direct emailing from EC2 boxes (apart from minimal test quotas)
So what we need to do is set up our mail to be sent via the Amazon SES service - add SES service and enable it
So now we need to send authorised emails to the Amazon SES gateway that will then forward them on to the outside world
Try to get sendmail to send authenticated emails, follow guide but it continues to bounce with authentication failure, give up and install postfix, follow the 20 steps of setting up the SASL password etc., and eventually it doesn't bounce with authentication errors - hurray!
But the email still bounces. So we need to verify all our sending email addresses - managed by the SES console - or use DKIM to get the whole domain verified and signed from which we are sending.
Modify the emails used by the sending software to ones which we can receive and validate - send and validate them. Our emails are working again.
Leave it a few days, we are not sending email anymore, boooo!
Check all the SES documentation, surprise, surprise SES also has quota limits for test level only, and you have to formally apply to get those limits lifted.
Contact the client and get him to make a formal request for quota lifting on his account.
*As part of the investigation check that email log a little more closely, it seems rather large, and we seem to be using up our quotas really quickly ... ah the default setup for unix cron sends an email for every job that returns text. The pintastic cache job returns text, so we are sending a pointless email every two minutes ... or trying to ... whoops. Make sure no cron or other unix system command is acting as a SPAM bot.
A few days later - Amazon say our quota has been lifted
Our emails have started sending again ... and they are still sending today !!!

Clients response, OK thanks, by the way since we added all the start up data / ie. uploaded images, the site takes at least two minutes to render the home page - or times out altogether.
Hmmm I did kinda notice that ... but hey he hadn't asked me to make the site actually usable speed wise ... until now!

'Why is the site, really, really slow?'

Hmmm wow it really is slow, lots of the time it just dies, that PHP cache thingy can't be doing much, so whats the problem.

Lets look at the web site, wow it takes 5 minutes for the page to come back ... so this isnt exactly Apache bench territory ... run up a few tabs looking at the home page ... and it starts just returning server timeouts.
So whats happening on the server ... whats killing the box ... top tells us that its Apache killing us here - with 50 odd processes spawning and sucking up all memory and CPU.
So we check out our Apache config and its the usual PHP orientated config of MPM prefork. But what are the values set ... they are for a great big multiprocessor cadillac of a machine, whilst ours is more of a smart car in its scale.
Lesson is that Amazon AMI's are certainly not smart enough to have different image configs for different hardware specs of instances they provide. So it appears they default their configs to suiting the top of the range instances (since I guess they cost the most). If you have a minimal hardware spec box ... you should reconfigure hardware related parameters for the software you run on it ... or potentially it will fail.
Slash all those servers, clients etc. values to the number of servers and processes the server can actually deliver. Slightly trial and error here ... but eventually we got MaxClients 30 instead of 500 etc. and give it a huge timeout.

<IfModule prefork.c>
StartServers       4
MinSpareServers    2
MaxSpareServers 10
ServerLimit      30
MaxClients       30
MaxRequestsPerChild 4000
</IfModule>
Now lets hammer our site again ... hurray it doesn't completely fall over ... one day it may return a page, but its horribly horribly slow still ie. 3 minutes absolute top speed - further home page requests the slower they get.
So lets get some stat.s, access the page with browser web dev network tools. Whats taking the time here. Hmmm web page a second, not great but acceptable, JS and CSS 0.25 sec, OK. Images hmmm images hmmm for the home page particularly ... 3-6 minutes ... so basically unusable.
So time to bite the bullet we know Apache can be slower at serving static pages if its not optimised for it - especially if resources are limited (its processes have a bigger memory overhead), thats why the Apache foundation has another web server, Apache Trafficserver , for that job
But whats the standard static server (thats grabbed half of Apache's share of the web in the last few years), yep nginx
So lets set up the front end of our site as nginx acting as a reverse proxy to Apache just doing the PHP work, with nginx serving all images. So modify Apache to just serve on 8080 on localhost and flip the site over to an nginx front end, with the following nginx conf ...

server {        listen       80;
        server_name moodpin.co.uk;

        location ^~ /(cache|cms|uploads) {
                 root   /var/www/html/;
                 expires 7d;
                access_log /var/log/nginx/d-a.direct.log ;
        }

        location ~* \.(css|rdf|xml|ico|txt|gif|jpg|png|jpeg)$ {
                 expires 365d;
                 root /var/www/html/;
                access_log /var/log/nginx/d-a.direct.log ;
        }

      location / {
            proxy_pass         http://127.0.0.1:8080/;

Wow, wow, so take that 3-6 minutes and replace it with 1-2 seconds.
So how many images on the home page - about 150 plus more with scrolling ... so that means we have a site that is on average under 0.5% dynamic code driven content and 99.5% static content/requests per page.
That is a very very static site - hence the 100 x faster speed!
So there you go client take that souped up smart car and go
Client replies ... ummm sites down - server proxy timeout error
Go to Google and check, so we have to make sure that nginx has timeout settings greater than Apache's - and nginx default timeout is 60 seconds
Make nginx _timeout settings into 10 minutes ... sounds bad, try the site, and it consistently delivers pages in 3 seconds or so assume that the scrolling request update page nature of the app, makes the timeout required much longer than the apparent time Apache is delivering PHP within?
Show the client again, hes happy.
Few days later ... this bit of the sites not working now
Check the code, discover that there is a handful of javascript files used by the system that are not really static - they are PHP templates generating javascript that appear static. Remove js file types from the list of files above in the nginx config. Hurray generated javascript served from Apache PHP now. Bit of site works again
OK we are done ... don't run Apache bench against the site ... if the client actually gets any users and it cant' cope - tell him to upgrade his instance.

I hope my tails of devops debuggery are useful to you, Bye!

Postgres character set conversion woes

2014-01-13T21:58:00.003+00:00

I had to struggle with sorting out some badly encoded data in Postgresql over the last day or so.
This proved considerably more hassle than I expected, partly due to my ignorance of the correct syntax to use to convert textual data.

So on that basis I thought I would share my pain!

There are a number of issues with character sets in relational databases.

For a Postgres database the common answers often relate to fixing the encoding of the whole database. So if this is the problem the fixes are often just a matter of setting your client encoding to match that of the database. Or to dump the database then create a new one with the correct encoding set, and reload the dump.

However there are cases where the encoding is only problematic for certain fields in the database, or where you are creating views via database links between two live databases of different encodings - and so need to fix the encoding on the fly via these views.

Ideally you have two databases that are both correctly encoded, but just use different encodings.
If this is the case you can just use convert(data, 'encoding1', 'encoding2') for the relevant fields in the view.

Then you come to the sort of case I was dealing with. Where the encoding is too mashed for this to work. So where strings have been pushed in as raw byte formats that either don't relate to any proper encoding, or use different encodings in the same field.

In these cases any attempt to run a convert encoding function will fail, because there is no consistent 'encoding1'

The symptoms of such data is that it fails to display. So is sometimes its difficult to notice until
the system / programming language that is accessing the data throws encoding errors.
In my case the pgAdmin client failed to display the whole field so although the field appears blank, matches with like '%ok characs%' or length(field) still work OK. Whilst the normal psql command displayed all the characters except for the problem ones, which were just missing from the string.

This problem has two solutions:

1. Repeat the dump and rebuild approach with the correct encoding, but to write a custom script in Perl, Python or the like to fix the mashed encoding - assuming that the mashing is not so entirely random as to be fixable via an automated script*. If it isn't - then you either have to detect and chuck away bad data - or manually fix things!

2. Fix the problem fields via pl/sql, pl/python or pl/perl functions that process these to replace known problem characters in the data.

I chose to use pl/sql since I had a limited set of these problem characters, so didn't need the full functionality of Python or Perl. However in order for pl/sql to be able to handle the characters for fixing, I did need to turn the problem fields into raw byte format.

I found that the conversion back and forth to bytea was not well documented, although the built in functions to do so were relatively straight forward...

Text to Byte conversion => text_field::bytea

Byte to Text conversion => encode(text_field::bytea, 'escape')

So employing these for fixing the freaky characters that were used in place of escaping quotes in my source data ...

CREATE OR REPLACE FUNCTION encode_utf8(text)
RETURNS text AS
$BODY$
DECLARE
encoding TEXT;
BEGIN
-- single quote as superscript a underline and Yen characters

IF position('\xaa'::bytea in $1::TEXT::BYTEA) > 0 THEN
RETURN encode(overlay($1::TEXT::BYTEA placing E'\x27'::bytea from position('\xaa'::bytea in $1::TEXT::BYTEA) for 1), 'escape');
END IF;

-- double quote as capital angstroms character
IF position('\xa5'::bytea in $1::TEXT::BYTEA) > 0 THEN
RETURN encode(overlay($1::TEXT::BYTEA placing E'\x22'::bytea from position('\xa5'::bytea in $1::TEXT::BYTEA) for 1), 'escape');
END IF;
RETURN $1;
END;
$BODY$

Unfortunately the Postgres byte string functions don't include an equivalent to a string replace and the above function assumes just one problem character per field (my use case), but it could be adapted to loop through each character and fix it via use of overlay.
So the function above allows for dynamic data fixing of improperly encoded text in views from a legacy database that is still in use - via a database link to a current UTF8 database.

* For example in Python you could employ chardet to autodetect possible encoding and apply conversions per field (or even per character)

WSGI functional benchmark for a Django Survey Application

2014-01-06T14:47:00.002+00:00

I am currently involved in the redevelopment of a survey creation tool, that is used by most of the UK University sector. The application is being redeveloped in Django, creating surveys in Postgresql and writing the completed survey data to Cassandra.
The core performance bottleneck is likely to be the number of concurrent users who can simultaneously complete surveys. As part of the test tool suite we have created a custom Django command that uses a browser robot to complete any survey with dummy data.
I realised when commencing this WSGI performance investigation that this functional testing tool could be adapted to act as a load testing tool.
So rather than just getting general request statistics - I could get much more relevant survey completion load data.

There are a number of more thorough benchmark posts of raw pages using a wider range of WSGI servers - eg. http://nichol.as/benchmark-of-python-web-servers , however they do not focus so much on the most common ones that serve Django applications, or address the configuration details of those servers. So though less thorough, I hope this post is also of use.

The standard configuration to run Django in production is the dual web server set up. In fact Django is pretty much designed to be run that way, with contrib apps such as static files provided to collect images, javascript, etc. for serving separately to the code. Recognizing that in production a web server optimized for serving static files is going to be very different from one optimized for a language runtime environment, even if they are the same web server, eg. Apache. So ideally it would be delivered via two differently configured, separate server Apaches. A fast and light static configured Apache on high I/O hardware, and a mod_wsgi configured Apache on large memory hardware. In practise Nginx may be easier to configure for static serving, or for a larger globally used app, perhaps a CDN.
This is no different from optimising any web application runtime, such as Java Tomcat. Separate static file serving always offers superior performance.

However these survey completion tests, are not testing static serving, simpler load tests suffice for that purpose. They are testing the WSGI runtime performance for a particular Django application.

Conclusions

Well you can draw your own, for what load you require, of a given set hardware resource! You could of course just upgrade your hardware :-)

However clearly uWSGI is best for consistent performance at high loads, but
Apache MPM worker outperforms it when the load is not so high. This is likely to be due to the slightly higher memory per thread that Apache uses compared to uWSGI

Using the default Apache MPM process may be OK, but can make you much more open to DOS attacks, via a nasty performance brick wall. Whilst daemon mode may result in more timeout fails as overloading occurs.

Gunicorn is all Python so easier to set up for multiple django projects on the same hardware, and performs consistently across different loads, if not quite as fast overall.

I also tried a couple of other python web servers, eg. tornado, but the best I could get was over twice as slow as these three servers, they may well have been configured incorrectly, or be less suited to Django, either way I did not pursue them.

Oh and what will we use?

Well probably Apache MPM worker will do the trick for us, with a separate proxy front-end Apache configured for static file serving.
At least that way, its all the same server that we need to support, and one that we are already well experienced in. Also our static file demands are unlikely to be sufficient to warrant use of Nginx or a CDN.

I hope that these tests may help you, if not make a decision, maybe at least decide to try out testing a few WSGI servers and configs, for yourself. Let me know if your results differ widely from mine. Especially if there are some vital performance related configuration options I missed!

Running the functional load test

To run the survey completion tool via number of concurrent users and collect stat.s on this, I wrapped it up in test scripts for locust.

So each user completes one each of seven test surveys.
The locust server can then be handed the number of concurrent users to test with and the test run fired off for 5 minutes, over which time around 3-4000 surveys are completed.

The number of concurrent users tested with was 10, 50 and 100
With our current traffic peak loads will probably be around the 20 users mark with averages of 5 to 10 users. However there are occasional peaks higher than that. Ideally with the new system we will start to see higher traffic, where the 100 benchmark may be of more relevance.

Fails

A number of bad configs for the servers produced a lot of fails, but with a good config these seem to be very low. So all 3 x 5 minute test runs for each setup created around 10,000 surveys, these are the actual number of fails in 10,000
so insignificant perhaps ...

Apache MPM process = 1
Apache MPM worker = 0
Apache Daemon = 4
uWSGI = 0
Gunicorn = 1

(so the fastest two configs both had no fails, because neither ever timed out)

Configurations

The test servers were run on the same virtual machine, the spec of which was
a 4 x Intel 2.4 GHz CPU machine with 4Gb RAM
So optimum workers / processes = 2 * CPUs + 1= 9

The following configurations were arrived at by tinkering with the settings for each server until optimal speed was achieved for 10 concurrent users.
Clearly this empirical approach may result in very different settings for your hardware, but at least it gives some idea of the appropriate settings - for a certain CPU / memory spec. server.

For Apache I found things such as WSGIApplicationGroup being set or not was important, hence its inclusion, with a 20% improvement when on for MPM prefork or daemon mode, or off for MPM worker mode.

Apache mod_wsgi prefork

WSGIScriptAlias / /virtualenv/bin/django.wsgi
WSGIApplicationGroup %{GLOBAL}

Apache mod_wsgi worker

WSGIScriptAlias / /virtualenv/bin/django.wsgi

<IfModule mpm_worker_module>
# ThreadLimit    1000
    StartServers         10
    ServerLimit          16
    MaxClients          400
    MinSpareThreads      25
    MaxSpareThreads     375
    ThreadsPerChild      25
    MaxRequestsPerChild   0
</IfModule>

Apache mod_wsgi daemon

WSGIScriptAlias / /virtualenv/bin/django.wsgi
WSGIApplicationGroup %{GLOBAL}

WSGIDaemonProcess testwsgi \
    python-path=/virtualenv/lib/python2.7/site-packages \
    user=testwsgi group=testwsgi \
    processes=9 threads=25 umask=0002 \
    home=/usr/local/projects/testwsgi/WWW \
    maximum-requests=0

WSGIProcessGroup testwsgi

uWSGI

uwsgi --http :8000 --wsgi-file wsgi.py --chdir /virtualenv/bin \
--workers=9 --buffer-size=16384 --disable-logging

Gunicorn

django-admin.py run_gunicorn -b :8000 --workers=9 --keep-alive=5

Django Cardiff User Group

2013-11-21T09:58:00.000+00:00

Last night I went to the second meeting of the Django Cardiff User Group.

This is a sister group to the DBBUG Bristol based one that I have been attending for the last 5 years. It was organised by Daniele Procida, who started attending DBBUG events a few years ago and has now decided to spread the word over the Severn, in Wales.

He is also organising the first UK Django conference in a couple of months, https://djangoweekend.org/ so its good to see one open source / Python group be inspiration for spawning another, and one that is perhaps more organisationally active than its progenitor.

The evening was fun, and it was good to meet and chat with Djangonauts over the border.

Andrew Godwin, Django core developer / release manager, gave us an update on all the new goodies to be added in Django 1.7
So this release is largely about really sorting out the niggling issues with relational database features, and the low level ORM handling of them.
It sees rationalisation of transaction handling with the use of nestable atomic statements, addition of generic connection pooling, and handling of composite keys.

Daniele demonstrated how to fly a helicopter (a toy one) via the Python command line, although Andrew seemed rather more adept at landing it safely. I gave a little reprise of a talk introducing DBBUG and how a developer can follow the road to their own open source contributions.

Thanks to everyone involved, I hope to get to the Django weekend too.

The ten commandments of software procurement

2013-11-21T09:25:00.003+00:00

For a medium to large scale organisation with its own IT department, I have found in today's market the following truths for software procurement apply. Yet they are usually poorly understood by staff in organisations outside the software sector. Who often view the world through antique pre1990 glasses, before the significant impact of web based providers, and the mixed economy of revenue models of modern software companies ...

Software is like any other creative output, it differs radically in quality, modernity and appropriateness - and this is entirely unrelated to its cost. Partly because the majority of today's leading software development companies are internet companies who do not use software charging for revenue.
So whether or not software is charged for directly via a licensing model is unrelated to whether it is mostly open source or closed source / commercial. Some software is no longer purchasable or the paid for solutions are too poor quality to be viable, compared to the free ones. In such cases other non-financial trading decisions must be part of the procurement arsenal. So policies on data release, etc.
Whether something is open or closed source is entirely irrelevant to its quality, scalability or any other attribute you care to name. These days any software stack is likely to be a mix of both.
However given source, tests, community and commit rate can all be checked for the former, it is far easier not to pick a lemon, with open source (not that a non-technical organisation tends to use any of these core indicators for procurement assessment).
Software is basically like literature - there are your Barbara Cartland's and your Shakespeare's - unfortunately less people are able to read it to work out what quality it is, so its a book which is generally just judged by its cover - hence the common misconception that software is all roughly the same - or that its quality relates to its cost.
However, the more generic a software application is, the more likely it is that you get better quality for a lower cost - standard economy of scale.
Hence Google GMail / Microsoft Office / open source Apache - are good quality - because they are large scale generic applications.
The more specific an application is, the more likely the software (whether open source or commercial) will have been put together by a core group of at most 3 or 4 developers, hence have less quality control methods applied, be more buggy and risk being generally of a lower standard.
If the IT Services department of your organisation is not sufficiently powerful enough to tell the users what they are going to get, despite what they want. It is common that many systems it deploys will require significant customisation, the more specific they are, the more the customisation.
Customisation of out sourced, closed source products is likely to incur significantly greater time and development cost than open source ones. Whether customised in house or out sourced. If customised in house then unless the software has a well designed API, docs etc. - ie is a widely used generic system from a major company. You usually find that you can only do black box integration and wrapper coding or resort to breaking license agreements by decompiling. All of which is difficult to maintain.
If out sourced, then the code may be open, test suited and documented within the supplying company, but you are likely to be paying around 3 times the wage to the company, than your inhouse cost, for a junior developers customisation / bug fixing time.
Due to historical reasons some types of software have far superior products that are all in one of these camps than the other ... So open source finance software is poor. Closed source web CMS software and repository software is poor, etc.
Non-technical companies will go through a 5-10 year cycle of outsourcing as much software as possible, then auditing consultancy costs, then ballooning internal development to cut costs, then deciding too much development is in house back to outsourcing again. This cycle wastes a lot of money due to its lack of understanding of the benefits of a stable highly selective mixed economy for software of outsourced, open source, commercial and in-house as being the ideal balance of functionality vs. cost.
Buying mix and match products from integrated product suites is a recipe for high cost, eg. MS Exchange Email and Google Docs, rather than all from one or other supplier.
Lastly and most importantly a non-technical organisation always makes its software procurement decisions based on political reasons*. Never on technical ones. This invariably means that it makes decisions that are significantly more costly, difficult to maintain and less well featured than it could achieve using a purely technical assessment process.
Usually they will also fail to have processes to properly trial run alternative products in a realistic manner, or to audit selections once the initial purchase is made. This may partly be because although auditing may save significant costs in the long run, it does introduce a means by which a wrong choice can be flagged up. Unfortunately it is often less embarrassing to make do with a bad choice, until its end of life, than admit a failure. Even though failing and acceptance of it as part of the process, is essential to delivery of quality (rather than make do) systems.

Thank you ... rant over :-)

* political reasons - The salesman managed to persuade someone suitably senior that they were technically clueless enough to believe them. This usually goes in tandem with, company software team response ... the salesman promised them it did what?? ... make damn sure that isn't in the contract / licensing agreement.

IT Megameet

2013-06-03T09:44:00.000+01:00

Yes MegaMeet may have a slightly cheesey ring to it, but the Bristol IT MegaMeet was a lot of fun, and a great idea for a regional software community event. So unlike most conferences this one is not for a particular company, language, platform or area of software expertise. Instead it brings together all the voluntary community software and technology groups within the region of Bristol, UK.

There are quite a number as it turns out, and so squeezing the conference into a single day resulted in 5 tracks. For a conference organised for the first time last year by a student to save his course - thanks Lyle Hopkins, it rather put our local University's official efforts in software community engagement to shame - however perhaps it might encourage them to rise to the challenge. (Lyle is a student at one, and I work at the other.)

So of the perhaps 30-40 software groups that are based in and around Bristol, over 20 were represented, a good turnout partly due to the efforts of one of Lyle's fellow volunteers, Indu Kaila, to do the leg work of attending all the local events and getting various members (like myself) to volunteer to represent their group at the event. So I am one of the hundred odd members of Bristol and Bath's Django User Group (DBBUG), started by Dan Fairs, and did a presentation about Python, Django, our group, and the process of contributing to open source - so rather a lot to pack into 40 minutes, but it seemed to go down OK.

There was the full range of enthusiast groups present, so I started the day finding out how the four colour theorem from map making applies to optimisation algorithms used in compilers, from the ACCU, who have been around for a very long time, starting out as a C programming community group. Then near the finish saw a good talk from Bristol Web folk reminding me about the core important issues to remember concerning front end web development - as more of a back end developer it can be easy to label this stuff somebody else's job, but with an ever increasing slice of the stack being client side in web development, these days, that is clearly a bad attitude.

There was more than a smattering of javascript related talks going on, from big data CouchDB and node.js back end use, through to more client side, and a very popular session, flying helicopters via javascript code.

The talks were rounded off with some talks about the charity cause that the day was helping to raise funds for, a cross atlantic row in aid of cervical cancer charity (plus an appeal for graphic design work for another member of the volunteer team from the Ukraine, who is in need of health care).

I then found myself in the rather comical position of receiving two awards from the extensive award ceremony for community involvement, etc. Both really on behalf of other people, but it was fun and lead on to the free bar and barbecue, always a popular way to round off a conference.

So thanks to the Megameet team, if nobody else comes forward, I can always represent DBBUG, South West Big Data or perhaps another new local group, again next year!

Cookie law, Cookieless and django tips.

2012-11-15T01:13:00.001+00:00

django-cookieless

Last week I released a new add on for django, django-cookieless, it was a relatively small feature that was required for a current project, and since it was such a generic package seemed ideal for open sourcing as a separate egg. It made me realise that I hadn't released a new open source package for well over a year, and so this one is certainly long over due in that sense.

Cookie Law

It also over due in another sense, EU Cookie law has been in force since May 2011, so legally any sites that are used in Europe, and set cookies which are not strictly necessary for the functioning of the site, must now request permission of the user before doing so. Of course it remains to be seen if any practical enforcement measures will happen, although they were due to this summer in the UK, for example. Hence many of the first rush of JavaScript pop up style solutions, have come and gone, as a result of user confusion. But for public sector clients particularly, it is certainly simpler to just not use cookies, if they are not technically required. Also it may at least, make developers rather less blasé about setting cookies.

Certainly most people would prefer not to see their browsers filled with deliberate user tracking and privacy invasive cookies that are entirely unrelated to the sites functionality. In the same way most of us don't like being tracked by CCTV everywhere we go ... unfortunately, the current Law doesn't have a good technical solution behind it, hence it may well founder over time. This is because cookie control is too esoteric for ordinary users, and even with easy browser based privacy levels configuration, any technical solutions are problematic, because a single cookie can be used to both protect privacy (in terms of security - e.g. a CSRF token) and invade it. It is entirely down to the specific applications usage of it, where these distinctions lie. Invasive methods can also be implemented via other session maintenance tools, such as URL rewriting, yet because no data is written to the users browser, it is outside the remit of this Law, so the Law makes little sense currently, and may well be unenforceable.

Perhaps it would of been better to aim to set laws related to encouraging adherence to set standards of user tracking, starting with compliance with the browser added 'Do Not Track' header, perhaps adding some more subtle gradations over time. With the targets of the Law, being companies whose core business is user tracking for advertising sales etc., starting with Google and working down. Rather than pushing the least transgressive public service sector, as the most likely to comply, to add a bunch of annoying 'Will you accept our cookies?' pop ups.

However even if this law dries up and blows away, for our particular purposes, we needed django to cater for any number of sessions per browser (as well as not using cookies for anonymous users).
Django's default session machinery requires cookies, so ties a browser to a single session - request.session set against a cookie. But because django-cookieless provides sessions maintainable by form posts, it automatically delivers multiple sessions per browser.

There are a number of security implications with not using cookies, which revolve around the difficulty of preventing session stealing without them. Given this is the case, django-cookieless has a range of settings to reduce that risk, but even so I wouldn't recommend using it for sessions that are tied to authenticated users, and hence could lead to privilege escalation, if the session were stolen.

Django Tips

I thought the egg would be done in a day, but in reality it took a few days, due to a number of iterations that were necessary as I discovered a series of features around the lesser known (well to me) parts of django. So I thought I would share these below, in case, any of the tips I gained are useful ...

The request object life cycle goes through three main states in django:
unpopulated - the request that is around at the time of process_request type middleware hooks - before it gets passed by the URL handler to decorators and then views.
partly populated - the request that has session, user and other data added to it (mainly by decorators) and gets passed to a view
fully populated - the request that has been passed through the view to add its data, and is used to generate a response - this is the one that process_response sees.
I needed to identify requests that were decorated with my no_cookies decorator at the time of process_request. But the flag it sets has not be set yet. However there is a useful utility to work around this, django.core.urlresolvers.resolve, which when passed a path, gives a match object containing the view function to be used, and hence its decorators, if any.
Template Tags that use a request get the unpopulated one by default. I needed request to have the session populated for the option of adding manual session tags - see the tags code, to have the partly populated request in tags django.core.context_processors.request must be added to the TEMPLATE_CONTEXT_PROCESSORS in settings.
The django test framework's test browser is in effect a complex mocking tool to mock up the action of a real browser, however like any mock objects - it may not exactly replicate the behaviour one desires. In my case it only turns on session mocking if it finds the standard django session middleware in settings. In the case of cookieless it isn't there, because cookieless acts as a replacement for it, and a wrapper to use it for views undecorated with no_cookies. Hence I needed to use a trick to set a TESTING flag in settings - to allow for flipping cookieless on and off.

My struggles with closed source CMS

2012-10-23T15:33:00.000+01:00

I produce a content migration tool for the Plone CMS, called ilrt.contentmigrator. It wraps up zope's generic setup as an easily configurable tool to export content from zope's bespoke object database, the ZODB, to a standard folder hierarchy of content with binary files and associated metadata in XML.

Some time ago I added a converter to push content to Google Sites, and I have recently been tasked with pushing it to a commercial CMS. Unfortunately rather than a few days work as before this has turned into a marathon task, which I am still unsure as to whether it is achievable, due to political and commercial constraints.

So I thought I should at least document my progress, or lack of, as a lesson for other naive open source habituated developers, to consider their estimates carefully when dealing with a small closed source code base, of which they have no experience.

Plan A - Use the API

So the first approach I assumed would be the simplest was to directly code a solution using "the API".

API is in quotes here, since in common with many small commercial software suppliers, the name API was in fact referring to an automated JavaDoc dump of all their code, there was no API abstraction layer, or external RESTful / SOAP API, to call. Its basically the equivalent of read the source for open source projects - but with the huge disadvantage of only legally having access to read the bare - largely uncommented - class and method names, not look at the source to see how they worked - or why they didn't.

Also no other customers had previously attempted to write code against the content creation part of the code base.

Anyhow back to the project, content import code was written and run, but nothing changed via the web front end.

It turns out that without a cache refresh the Admin interface does not display anything done via the API, hence it is essential to be able to determine if changes have occurred.

Similarly if content is not cleared from the waste-basket then it cannot be recreated in the same location, along the lines of a test import scenario.

Having written the code to set up the cache and other API managers and clear it. I discover that
cache refresh doesn't work via the API, neither does clearing the waste basket.

The only suggested solution was turn the CMS off and on again.

Plan B - Use the API and a Robot

Rather than resort to such a primitive approach, I decided to develop a Selenium, web driver based robot client. This could log into the CMS and run all the sequences of screen clicks that it takes to clear the waste-basket and cache after an API delete has been called.

Eventually all this was in place, now content could be created via the API, and media loaded via the robot (since again anything that may use local file system caches or file storage, is inoperable via the API).

The next stage was to create the folder hierarchy and populate it with content.

Unfortunately at this point a difficult to trace API bug reared its head. If a subfolder is created in a folder via the API, then it gets created in a corrupted manner, and block subsequent attempts to access content in that folder, because the subsection incorrectly registers itself as content - but is then found to be missing. After much time spent tracing this bug, the realisation dawned that, it would therefore not be viable to create anything but a subset of content objects via the API, and everything else would need the robot being mixed in to work.

This seemed like a much less maintainable solution. Especially since most pages of the CMS had 50 or more javascript files linked to them, so only a current browser WebDriver client robot would function at all with it. Even then, often the only way to get the robot clicks and submits to work was to grab the javascript calls out from the source and call the jQuery functions directly with the WebDriver javascript engine.

Plan C - Use the import tool and a Robot

So having wasted 3 days tracing that a bug was in the (closed source) API, it was time to take a step back and think about whether there was realistically a means by which an import tool could be created, by a developer outside of the company supplying the CMS, ie. me.

Fortunately the CMS already had an XML export / import tool. So all we need to do is convert our XML format to the one used by the company, and the rest of the code was their responsibility to maintain.

At first their salesman seemed fine with this solution. So I went away and started on that route. Having left the existing code at the point where the sub-folder creation API bug, blocks it working.

However on trying out the CMS tool, it also failed to work in a number of ways. The problems that it currently has are listed below, and my focus is presently writing a selenium based test suite, that will perform a simple folder, content and media export and import with it.

Once the tool passes, then we have confirmation that the API works (at least within the limited confines of its use within the tool). We can then write a converter for the XML format and driver for the tool / or even revisit the API + robot route, if its fixed.

Below are the issues, that need to work, and that the test suite is designed to confirm are functional ...

Content Exporter Issues (status in brackets)

The folder hierarchy has to be exported separately from the content. If you select both at once - it throws an error (minor - run separately)
The hierarchy export appends its data when exported to the same folder, so creating an invalid hierarchy.xml after the first run (minor - could delete on the file system in between)
Hierarchy export doesn't work. It just creates XML with the top level folder title wrapped in tags containing the default configuration, attributes - but no hierarchy of folders. (blocker - need the hierarchy especially to work, since the sub-folder creation was the blocking bug issue with using the API directly)
Content export only does one part of one page at a time, ie. a single content item (minor - this means that it is not a very useful export tool for humans - however via a robot - it could be run hundreds of times to get a folder done)
The embedded media export doesn't work, no media is created (blocker - we need to be able to do images and files)
Content import - A single content item works - and if the media already exists with the right id, that works. Cant judge about media import - since media export fails so have not got a format to follow. (blocker - we need media to work as well as a minimum. Ideally we could import all the parts of a page in one go - or even more than one page, at once!)
Hierarchy import - Creating a single section works. Cannot judge for subsections - since the export doesn't work. (pass?)
Configuration changes can break the tool (blocker - the whole point of the project is to provide a working tool for a phased transition of content, it should work for a period of at least two years)
Not sure if the tool can cope with anything but default T4 metadata (minor - A pain but the metadata changes to existing content are part of the API that should function OK directly, so could be done separately to the tools import of content.)

Once we have a consistently passing CMS tool, we can assess the best next steps.

The testing tool, has proved quite complex to create too, because of the javascript issues mentioned above, but this now successfully tests the run of an export of a section and a piece of content, checking the exported XML file, also running the import for these to confirm the functionality is currently at the level listed above.

Having been burnt by my experience so far, my intention is to convert the Plone export XML and files to the new CMS native XML format - push it to the live server and run the robot web browser to trigger its import, so that eventually we will have a viable migration route - as long as the suppliers ensure that their tool (and underlying API) are in working order.

Review of talks I attended or was recommended at Europython 2012

2012-07-09T14:56:00.000+01:00

This is a quick jotting down of recommendations with video links related to this years Europython.
It is really intended for other developers in my workplace.
But in case it has wider relevance I have posted it to my blog. Appologies for the rough format - and remember that with 5 tracks and 2 trainings, I only got exposed to a small portion of the talks.

YouTube channel of all talks

Inspiring / Interesting talks

Permission or forgiveness
Linking women programmers, and the approach of Grace Hopper, inventor of the compiler to the wider implications of her approach. To enable creativity in an organisation, the rules, that ensure its survival, must be broken. Since middle management's inevitable behaviour will default to blocking permission for innovations, just ignore them wrt. anything technical, for the greater good!

Music Theory - Genetic algorithms and Python
Fun and enthusiastic use of Python to rival the masters of classical music!

State of python
So general view of dyamic langs being on the up ruby, js etc.
Seems that static typing snobbery is Guido's bugbear.
Increase in popularity shown by 800 at ep12, compared to 500 at ep10
Then bunch of internal python language decisions stuff, and dealing with trolls

Stop censorship with Python
Tor project used to allow uncensored internet in China, etc.

The larch environment
Every wanted to write code with pictures rather than boring old text?
Pretty amazing what this PhD student has put together.

Aspect orientated programming
Possibly inspire you to stretch a paradigm as far as it will go (even to breaking point?).

SW Design and APIs

Best to stick with last year's Python API design talk

Scaling or deployment (for Django)

Django under massive loads
Good coverage of scaling django especially wrt. running on Postgresql. Coverage of classic issues wrt. performance and the django ORM. So for example using slice with a queryset always loads the whole queryset into memory.

How to bootstrap a startup with Django
Coverage of the standard add on components for a medium scale django deployment

Bitbucket - Healty webapps throught continous introspection

Have released geordi, django-dogslow and interruptingcow to handle issues.

geordi provides a URL based means of getting full PDF profiling reporting back for pages
dogslow does monitoring and email reporting of hotspots with traceback from live systems.
interruptingcow allows setting of nested timeout levels for doing expensive over lighter operations for the web server

Spotify question session
Useful insights into scaling - particularly for a large scale Python applications using Cassandra.

Need to be careful with compaction routine.
So half load capacity due to it making spikes. So it has to sometimes jump if overloaded
dont see error - to fix this last percentage is very hard. Instead go for pretend its working approach. Just retry to catch failiures. Cassandra - dont upgrade .8 to 1.0
NB: Employed oracle guys who worked on JVM to fix some of cassandras issues - well JVM/Cassandra bugs it revealed at load!

What I learned from big web app deployments - how to scale application deployments (zope particularly)

concurrent.futures
Concurrent programming made easy. Example was bulk processing of a big Apache log. Ditch the old separate multithread or multiprocessor libraries for this python 3 (and backport 2) package.

Language approaches

Using descriptors - useful to know some of the use cases for these language features
PyPy JIT - a look under the hood now I know that RPython is not R + Python, but restricted Python.
PyPy keynote - coverage of current activity in pypy development

Big Data / Mining

pandas and pytables - amazing simple API to mine data (in this case financial)

Testing

Lessons in Testing - Disqus

Useful insights into testing

Set up included jenkins, nose etc. Run tests concurrently to speed up test suite.

Note that selenium was painful for them - Far too brittle!

TDD with selenium

The presenter said this may be a little basic level for me - and a bit crowded so I went to other stuff in the end.

Other talks I attended

Let your brain talk to computers
Ask your BDFL
Becoming a better programmer
NDB: The new App Engine datastore library
Advanced Flask patterns (cancelled)
Big A little I (AI patterns in Python)
Increasing women's engagement in Python
Minimalism in Software development
The integrators guide to duct taping
Guidelines to writing a Python API
Composite key is ready for django 1.4
Heavybase: A clinical peer-topeer database
Beyond clouds: Open source edge computing with SlapOS
Creating federated authorisation for a Django survey system (my talk!)

Lightning talks

Monday

1 Transifex django translate .po file generator

2 spotify help page

3 windows conversion

4 egenix pyrun

5 django lukaz comic django re-engineering - 'I regret nothing' - story

6 the scraper wiki guy - julian

7 sqlalchemy tim - add audit history

8 PSF - all can go! Napoli

9 recipe share - flask based openstate.eu

10 fund raiser

11 rededdy online student python math runner

12 kivy.org - ui interface toolkit - mobile

13 python brochure for marketing

14 pyscopg and pypy - help out with moving to pure python ctypes implementation

15 micro framework

Wednesday

1 PCRE - perl like regex instead of re

2 october pycon s africa

3 docbook to sphinx

4 will hardy - nlp and geocoding

5 moin moin - whoosh search

6 python anywhere

7 pycon uk Sept Coventry

8 controlling telescopes South Africa

9 music analysis with nlp and ai

10 massage man

11 jbart rest based mobile mashup toolkit

12 social fitness game - fitocracy.com

13 pyramid antipatterns & pycon japan

14 django is too simple - new made from bits sqlalcemy jinja

Friday

Didn't record them I am afraid.

---------------

The best lightning talk was the 'I regret nothing' one, and the most impressive was the South African guy controlling the largest radio telescope array in the world live - via python desktop app - showing via videocam of the array moving around.

NB: Also Higgs boson got some coverage during the conference since the data sensing input and big data parsing software is largely Python.

Books from the O'Reily stall, I noticed may be an entertaining read

Confessions of public speaker - scott berkun

7 langs 7 weeks
Maybe time to download and read https://bitbucket.org/BruceEckel/python-3-patterns-idioms/

Europython 2012 - hot stuff

2012-07-07T22:07:00.002+01:00

Being English I often tend to start a new conversation with the weather. At Europython this week I had good reason, it was hot 30 - 35 degrees centrigrade. Whilst at home the UK has been bathed with ... rain and temperatures of 20 at the most. Of course for Florentines it is only a couple of degrees above the norm, so nothing worth talking about. However they were polite enough to respond to this, or any other opening conversational gambit I offered, and in general I found Europython to be a very social event this year, in terms of meeting new people, probably more so than any previous conference I have been to for Python or its frameworks.

At this year's conference I attended on my own, and hence I made a bit more of an effort to be sociable. This along with luckily getting a poster session (that could help justify work sending me!), were prompts to try and start conversations where I may normally have been more reticent.

The conference itself has a great atmosphere for mixing in any case. With possibly four main themes. Core language and implementation developers. Web applications. Data mining and processing. Configuration management and automation tools. Of course within these there are divisions, the investment banking iPython analysers vs. the applied science academic researchers. Or Pyramid vs. Django, etc., but it seems everyone can usefully share ideas, whether they are sales engineers from a large company or a pure hobbyist.

This inclusiveness was also a theme in itself, particularly wrt. women. Kicking off with Alex Martelli's keynote about the inventor of the compiler, along with a lot of other stuff, Grace Hopper.
Unfortunately they are under represented in the coding sector, at work I think its around 20% for programmers, but even that is higher than the average - probably because we are public sector / unionised. This is reflected by much lower membership of our local DBBUG Django group who are mainly drawn from the commercial sector, with only 2 out of around 50 active members. Europython was as bad, at 4% last year, but that has doubled this year to around 60 of the 750 attendees.

Returning to Python themes. The chance to chat to the data miners was most useful since we are currently in a state of transition at work. Having been involved in internal systems, particularly CMS from the days when it was evolving 10 years ago, we are now moving to a more pure R&D role.
This means that CMS work is to be dropped and whilst we want to continue large custom web application work related to research (thats where my poster session on our Django survey system comes in). We also want to be moving towards work that ties up with the University's applied science research - especially big data mining and the like.
So for me the chance to talk (and listen) to people across a range of disciplines was ideal.

Lastly, I also realised how stale my knowledge of the the new features of the language are. Time to get a book on Python 3 - and get back on track I think. Oh and of course many thanks to the Italian Python community and conference organisers for a really great conference - and more than my fare share of free cocktails - which certainly helped break the ice.

Ed Crewe

Talk about Cloud Prices at PyConLT 2025

Introduction to Cloud Pricing

csp x service x type x tier x region3 x 200 x 50 x 3 x 50 = 4.5 million

Outline of our use of Data Pipelines

The Scraper Class

Postgres Embedded wrapped with Go

Use a Click wrapper with Tests

The click package provides all the functionality for our pipeline..

A simple DAG with Soda Data validation

What is JSON transcoding?

A tool that enables use of gRPC as a single API for microservices and REST web frontend, by automated translation of your gRPC API into JSON RESTful services.

What is gRPC?

Why use gRPC rather than REST?

Why use JSON transcoding?

Why still use REST too?

How does JSON transcoding work?

Why use it?

What about Transcoding other HTTP content types that are not JSON?

Data streaming large content types

How to try out Envoy with the transcoder filter

Testing

Software Development with Generative AI - 2024 Update

Why write an update?

Github Copilot Features

Copilot Limitations

AI prompting and commenting issue

Copilot Translation

Software Engineering Hiring and Firing

This post is NOT about getting your first Software job

F I R E D

H I R E D

Software development with Generative AI

The Current State of AI Software Generation

The user tries to describe what they want generated in terms of a snippet of high level programming language code using standard English. They submit it to the AI tool. So what are they asking the AI to generate and how does it do it?

The AI

The Non-Developer

Or the Developer

Another Way

Self-Programming Computers

Could AI could take my job?

Sustainable Coding, and how do I apply it to myself as a Cloud engineer?

The Cloud Climate IssueComparing today with 30 years ago is useful in terms of seeing where we are going...

Today data centres (hence the Cloud) is causing 2% emissions, as much as the whole of IT in 1990 and as much as today's aviation industry.

How do we measure sustainability

What changes, as a developer, could improve sustainability?

Sustainability for a Cloud SaaS company

Choice of software languages

Software Architecture

From this perspective, one the big cloud vendors are keen to promote, Cloud is the sustainable solution not the problem.

Development, Testing & Release practises

Anything else?

Ten Software Engineering Managers

Engineering management from the perspective of the managed.

Remote managers are better

Peer level managers are better

Most people quit their job because they have a bad manager

Ten Software Managers

Tech sector lay offs

Fake News about Tech Industry Collapse

K8s Golang testing - Mocks, Fakes & Emulators

K8s Go Unit Testing

Mocks

Fakes

K8s Go integration Testing

Summary

Teaching an old Pythonista new Gopher tricks

CLI framework (draw)

ORM (draw)

Data source introspection tool (Python win)

Templating tool for creating models (Go win)

CSV Parser (Python win)

Conclusion

Using Java Spring & MyBatis for dynamic schema integration

Object-relational mapping - a personal history

MyBatis - is it an ORM I see before me?

Bending the fork

Dynamic schema handling

Fixture loading

Fixture dumping

csp x service x type x tier x region
3 x 200 x 50 x 3 x 50 = 4.5 million

The Cloud Climate Issue

Comparing today with 30 years ago is useful in terms of seeing where we are going...