tag:blogger.com,1999:blog-66038373392366296982024-03-17T16:41:56.023+00:00Ed CreweEd Crewe's personal website blogEdhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.comBlogger53125tag:blogger.com,1999:blog-6603837339236629698.post-53371132873360494212023-12-11T09:57:00.012+00:002023-12-11T21:55:38.292+00:00Software development with Generative AI<h3 style="text-align: left;"></h3><h4 style="font-size: medium; font-weight: 400; text-align: left;"></h4><h2 style="text-align: left;">The Current State of AI Software Generation</h2><h4 style="text-align: left;"><span style="font-weight: 400;">The user tries to describe what they want generated in terms of a snippet of high level programming language code using standard English. They submit it to the AI tool. So what are they asking the AI to generate and how does it do it?</span></h4><p style="font-size: medium;"><b>The high level language</b></p><h3><p><span style="font-size: small;"><span style="font-weight: 400;">High level programming languages are </span>human languages<span style="font-weight: 400;"> composed of english and maths symbols designed for the comprehension and composition of precise computer instructions. The language makes no more sense than English to a computer. It has to be compiled or interpreted to computer language for it to run. So it may compile to an intermediate bytecode language and then maybe to human readable assembly language - before final translation into the unreadable machine code that the computer runs.</span></span></p><p style="font-weight: 400;"><span style="font-size: small;">A programmer learns the high level language and becomes fluent in it. They can read and understand the functionality of that code. With the complexity of the machine specific implementation stripped away.<br />Leaving just the precise functional maths and english / symbology that describes the computer functionality. They think in that code, in order to write it.</span></p><p style="font-weight: 400;"><span style="font-size: small;">Unlike English language, it can succinctly describe computer functionality in a few lines. <br />Even then, the majority of a programmers time is spent debugging the high level language - and fixing what they have written to be bug free. Because it is difficult to think clearly in code, pre-determining all edge cases etc.</span></p></h3><h3></h3><h3 style="text-align: left;">The AI</h3><h3 style="text-align: left;"><p style="font-weight: 400;"><span style="font-size: small;">A detailed English language description of what functionality is required. Plus the name of a high level programming language, are submitted to the AI tool.</span></p><p style="font-weight: 400;"><span style="font-size: small;">It does a search of the web, eg. stack overflow etc. for results for that code language. For Chatbot use (eg. ChatGPT) it applies an English language Large Language Model, LLM (a numeric encoding of learning of the English language) to generate a well phrased aggregation of the most popular results that match the English prompt. </span></p><p style="font-weight: 400;"><span style="font-size: small;">For software use (eg. CoPilot) it works just the same, but the LLM learns English to high level software language aggregate translation. From code examples data, eg. github, to generate what the code syntax might be to match the English description of it.</span></p><p style="font-weight: 400;"><span style="font-size: small;">Finally it returns an untested snippet of generated high level code.</span></p><p style="font-size: medium;">The Non-Developer</p><p style="font-weight: 400;"><span style="font-size: small;">The non-developer pastes it in place and tries to run the program with it in.</span></p><p style="font-weight: 400;"><span style="font-size: small;">They may be able to puzzle out the high level language - but don't naturally think in it, just as people without mathematics skills can only think as far as basic arithmetic and are dyslexic when it comes to complex equations.</span></p><p style="font-weight: 400;"><span style="font-size: small;">It seems to work around 50% of the time. When it fails they, go back to square one and try to rephrase their English prompt. <br /><br />They patch together block after block of prompt created generated code. A crazy paving of a program that likely has a number of bugs and inappropriate features in it. But it kind of works, for the non-developer, that is good enough.<br /><br />The code gets pushed out there with all its imperfections, and starts to populate the web of code data that is used to generate the next AI code snippet.</span></p><p><span style="font-size: small;">Or The Developer</span><br /><br /><span style="font-size: small;"><span style="font-weight: 400;">The developer reads the code and understands it, determines if it should do what they want. Or if they just want to use some of it as an example.</span><br /><br /><span style="font-weight: 400;">They cut paste and rewrite it, using it as a hint tool. Or an extension to their IDE's existing auto-code generation tools that work using templated code and language / import library searches.</span></span></p><p><span style="font-size: small;"><span style="font-weight: 400;">Hopefully their IDE is set up to clearer distinguish between real code completions and possible generative code completions. Since otherwise the percentage of nonsense code created by the generative AI pollutes the 100% reliability of IDE code completion, and harms productivity.<br /></span><br /><span style="font-weight: 400;">Then they run their code and debug as usual.</span></span></p><p style="font-weight: 400;"><span style="font-size: small;">At least 75% of programming time is not on writing code, but on making sure that the high level instructions are exactly correct for generating bug free machine code. So iteratively refining the lines of code. With code a single comma out of place can break the whole program. When language has to be so carefully groomed, succinct minimal language is essential.</span></p><p style="font-weight: 400;"><span style="font-size: small;">For many developers adding an imprecise, non mathematical language, that is entirely unsuited to defining machine code instructions, such as English, to generate such code is problematic. It introduces a whole layer of imprecision, complexity and bugs to the process. Slowing it right down. Along with requiring developers to write a lot lot more sentences (in English) rather than just quickly typing out the succinct lines of Python (or similar) programming language they have in their head.<br /><br />The generative AI can help students and others who can hardly code yet in a computer language, but can it actually improve productivity for real, full time, developers who are fluent in that language?</span></p><p style="font-weight: 400;"><span style="font-size: small;">I think that question is currently debatable. Because I believe the goal of adding yet another language to the stack of languages that need to be interpreted for humans authoring computer code, especially one as unsuited as English, is only useful for people who are far from fluent in the software language.</span></p><p style="font-weight: 400;"><span style="font-size: small;">Once we move beyond error prone early releases of LLMs like ChatGPT-4 then tools such as CoPilot may start to become much more effective at authoring software, and actually produce code that is as likely to work first time and have the same amount of bugs as your average software developer's first cut of the code. We may reach that point within a few years. At which point professional software developer will need to be adept at using it as part of their toolset.</span></p><p style="font-weight: 400;"><span style="font-size: small;">Even so I believe the whole conception of the application of AI to writing software could benefit from more work engaged in a computer centric alternative approach to the current one focussed on generating plausible human language responses. It only dominates because of all the efforts related to NLP and human interaction. But taking that and sticking on to writing human software languages is more about creating a revenue stream than attempting to have AI do the main work of software development.<br /></span></p><p style="font-weight: 400;"><span style="font-size: small;">Until then, AI will never be able to replace me, as a software developer. Only be another IDE tool I need to learn ... in time when it improves sufficiently to increase productivity.</span></p><p style="font-size: medium; font-weight: 400;"></p></h3><h2 style="text-align: left;">Another Way</h2>Copilot and the like currently use the ChatGPT approach of a Chatbot front end tied to an English language LLM to generate aggregate search engine results in a human language. But there is no domain specific machine learning knowledge about the semantics of the content. So it doesn't understand, and certainly doesn't pre-check the code. Just as ChatGPT doesn't understand the search engine content. Since currently there are no domain specific trained models for the content in the loop. So if asked a question about pharmacy it doesn't plug in one of the AI models that has learnt pharmacy and is used by that industry to aid in the development of medicines. It understands nothing, it is a chatbot, just a constructor of plausible answers based on search popularity.<br />Similarly CoPilot has learnt how to predict what code somebody might be trying to write, but it hasn't learnt how to code.<p></p><p style="font-weight: 400;">This approach cannot lead to AI generating innovative new coding approaches, full self-coding computers, or remove the need for human readable high level programming languages.</p><p><span style="font-size: small;"><span style="font-weight: 400;">There have been experiments with applying </span><a href="https://medium.com/inspiredbrilliance/test-driven-generation-use-ai-as-a-pair-for-programming-6c1e0e4a8b45" style="font-weight: 400;">test driven development to AI generated code</a><span style="font-weight: 400;">, but I have not heard of serious attempts to address the bigger picture...</span><br /><br /></span></p><ul style="text-align: left;"><li style="font-weight: 400;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-weight: 400;">Move all functional code writing to be AI only.</span></span></span></li><li style="font-weight: 400;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-weight: 400;">Remove the need for any high level computer language for humans to gain fluency in.</span></span></span></li><li style="font-weight: 400;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-weight: 400;">Have AI develop software by hundreds of thousands of iterative composition TDD cycles. </span></span></span></li><li style="font-weight: 400;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-weight: 400;">Parallel refactoring thousands of solutions to arrive at the optimum one.</span></span></span></li><li style="font-weight: 400;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-weight: 400;">Use AI that understands the machine code it is generating by training it on the results of running that code. </span></span></span></li><li style="font-weight: 400;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-weight: 400;">The ML training cycle must be running code not matching outputs to pre-ranked static result training sets.</span></span></span></li><li><span style="font-size: small;"><span style="font-size: small;"><span style="font-weight: 400;">In addition to the static LLM that encodes the learning of machine code authoring, dynamic training cycles should be run as part of the code composition. Task based ephemeral training models.</span></span></span></li><li style="font-weight: 400;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-weight: 400;">Get rid of the wasted effort training AI to understand English, Python, Java, Go or any other existing human language evolved for other tasks.</span></span></span></li><li style="font-weight: 400;"><span style="font-size: small;"><span style="font-size: small;"><span style="font-weight: 400;">Finally we are left with the job of telling the computer what its software should do. <br />We do not want to use English for that, its way too verbose and inaccurate, similarly we don't want a full high level programming language to do it. We need a new half way house. A domain specific language (DSL) for defining functionality only, designed for giving software specification's to AI that it can use to generate automated test suites.</span></span></span></li></ul><p></p><h3 style="text-align: left;"><span style="font-size: medium; font-weight: 400;">Self-Programming Computers</span></h3><p style="font-weight: 400;">Exploring the last point in more detail...</p><p style="font-weight: 400;">Create a higher level pseudo-code language for describing the required functionality that is more English readable than even current high level languages such as Python.<br /></p><p style="font-weight: 400;">Make that functional DSL focus on defining inputs and outputs - not creating the functionality, but creating the black box functional tests that describe what the working code should do.<br /></p><p style="font-weight: 400;">Maybe add tools for a slightly no-code approach, with visual generators for the language, eg graphical pipeline builder tools. For people who find thinking visually easier than thinking symbolically.</p><p style="font-weight: 400;">The software creator uses the DSL to create an extensive set of functional definitions for a project.</p><p style="font-weight: 400;">The DSL language design and evolution is optimised for LLM interpretation. So it has very tight grammatical and syntactical usage that promote accurate generative outputs.</p><p style="font-weight: 400;">A new non-developer friendly high level pseudo code language / rigorous AI prompt writing lingo.<br /></p><p style="font-weight: 400;">Some basic characteristics of the DSL:<br /></p><ol style="text-align: left;"><li>auto-formatting (like Go) minimizing syntactical variation</li><li>To quote Python's creator - 'There should be one-- and preferably only one --obvious way to do it.'<br />But strictly applied, rather than as a vague principle as Python does</li><li>unlike any other high level language, the design needs to be optimized only for specifying functionality, a high level templating language from which test suites are generated.<br /></li><li>the language will never be used to implement functionality</li><li>uses simple english vocabulary and ideally minimal mathematical symbology</li></ol><p></p><p style="font-weight: 400;">These DSL prompts are written with a LLM for the DSL it helps create its own prompts and the code creator uses it to refine all the DSL definitions that specify the full functionality. </p><p style="font-weight: 400;">The specification DSL auto generates all the required tests in a low level language.</p><p style="font-weight: 400;">Since the system should also have a generative AI LLM trained for C or assembly language.<br />This is what creates the actual functional code by iteratively running and rewriting it against the specification encoded into the tests.</p><p style="font-weight: 400;">The AI tool then generates the tests for that implementation and uses TDD to generate the actual functional code - eventually the system should improve to a level better than most software developers. The code it writes no longer needs to be read by a human - because a human will be unable to debug it at anything like the speed the AI tool can.</p><p style="font-weight: 400;">So we use generative AI to do the part of the job that actually takes all the time. Debugging, refactoring and maintaining the code, making sure it really does what is required functionally. Rather than the quick job of writing a first cut of it that might run without crashing.<br /></p><p style="font-weight: 400;">Most importantly we don't introduce the use of the full English language, the language of Shakespeare, the language of puns, double meanings, multiple interpretations, shades of grey, implied feeling and emotions, into a binary world to which it is entirely unsuited.</p><p style="font-weight: 400;">Also we don't need English or high level computer languages in the stack of mistranslation at all.<br />Because we are not training the AI to understand human languages. We are training it to write its own machine code language based on defining what behaviour it should implement. <br />BDD / TDD generative AI if you like.</p><p style="font-weight: 400;">Human's no longer learn complex mathematical process based languages that can be translated into machine code. They learn a more generic language for specifying functional behaviour.<br /><br />This gives more freedom to widen the DSL to mature into a general precise AI prompt language.<br /><br />Whilst allowing computers to evolve more machine learning driven software architectures that are self maintaining and not so constrained into the models imposed by current human intelligence and coding practise based programming languages.</p><p style="font-size: medium; font-weight: 400;"></p><h3 style="text-align: left;">Could AI could take my job?</h3>Perhaps if all of the above were in place, then finally we would arrive at a place where AI could replace traditional software development and high level software languages.<br />With concerted effort it could be in 10 years, if some big companies put some serious investment in trying to replace traditional software development.<br />Code monkeys will all be automated. Only software architects would be required and they would use a new functional specification AI prompt language, not a programming language.<br /><br />Of course if politicians are scared that dumb ChatGPT can already write as good a speech as they can. Plus replicate all the prejudices and errors of its training data and trainers. <br />Then setting AI free to fully write software, and itself ... will be way more scary in its long term implications.<div><br />Meanwhile we are currently at a place where it arguably doesn't even improve productivity for an experienced software developer, only allows non-developers, students and other language newbies to have a go at writing one of the many dialects of human languages, known as computer languages. <br /><br />Their mix of math, english, symbols, logic and process may appear more like English than Musical notation or pure maths, but sadly they are no more suited to creation by an English language Chatbot approach.<p></p></div>Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-58136054647556423832023-07-05T10:23:00.029+01:002023-07-11T10:14:02.235+01:00Sustainable Coding, and how do I apply it to myself as a Cloud engineer?<p> I work as a developer of a Cloud service, Big Animal - EDB's Cloud Postgres product. So I went along to a meetup the other day, a panel discussion on<a href="https://www.meetup.com/bristol-cloud-native-devops/events/293342807/"> Leveraging Cloud Computing for Increased Sustainability</a></p><p>It got me thinking about this whole issue, and how in a practical sense I could do anything that might reduce the carbon footprint of the service I work on. </p><p>The conclusion I came to was that I don't really know ... and to some extent neither did the panel, so Cloud computing may give you some fancy tools to help assess these things, such as <a href="https://learn.microsoft.com/en-us/industry/sustainability/sustainability-manager-overview">Microsoft Sustainability manager.</a> But there are no black and white answers as to what would make something more sustainable - even the basic one of - run it in the cloud or on prem, very much depends on what you are running and how. For one or other to work out as the more sustainable.</p><p>So on a global scale just how significant is computing as a percentage of global energy consumption and emissions?<br /></p><h4 style="text-align: left;"><span style="color: #38761d;">The Cloud Climate Issue</span><br /><br /><span style="font-weight: normal;">Comparing today with 30 years ago is useful in terms of seeing where we are going...</span><span style="color: #cc0000;"><br /></span></h4><div><span><div style="text-align: left;"><i><br />1990s vs 2020s IT as a proportion of global energy and emissions</i></div></span></div><p></p><ul style="text-align: left;"><li>1990s Energy 5% (Most from Office desktop computers and CRTs) - 2% emissions</li><li>Today 8% (Most personal devices, laptops and mobile, includes 2% data centres) - 3% emissions<br /><br /></li><li>Compute power / storage is around 30,000 times greater (by Moore's Law)<br /></li><li>Data 16 Exabytes (EB) has grown to 10,000 EB so 600 times, with the majority in the last 3 years</li></ul><div><h4><span style="color: #cc0000;">Today data centres (hence the Cloud) is causing 2% emissions, as much as the whole of IT in 1990 and as much as today's aviation industry.</span></h4></div><p></p><p>So working as a cloud engineer looks like a poor choice for someone concerned about climate change!<br /><br />But on the face of it we have been pretty efficient, our compute and storage has massively increased, yet consumption + emissions only by 50%. But the issue is the acceleration in usage, which means we could double energy and emissions in 20 years, if nothing was done to improve sustainability.</p><p>The increase in compute power has remained fairly consistent since the advent of the transistor making Moore's Law, more a law of Physics than of human behaviour. Although of course that technology is now at its limits of miniaturisation. So the energy and emissions consumed per Gigaflop of compute has drastically dropped - but now everyone has the compute power of a supercomputer in their pockets. <br />The first supercomputer to reach 1 GFlop was the Cray in the 80s, by the 90s an IBM 300 GFlops supercomputer beat Gary Kasparov at chess - today a Google Pixel 7 phone is 1200 GFlops.<br />Hence our consumption has rather outstripped our increase in compute.</p><p>But it is the <span style="color: #e06666;">explosion in data</span> that is a story of human behaviour. Hand in hand we have reduced costs of cloud storage and monetised personal data. With software companies valued based on how many customers, and more importantly how much customer data they have. Recent advances in AI have proved the value of big data lakes for training models to produce practical ML applications. </p><p><span style="color: #e06666;">Combine that with</span> <span style="color: #e06666;">the problem of induced demand. The more and bigger roads you build, the more traffic you get. Cloud puts a 6 lane highway outside everybody's front door</span>.</p><h4 style="text-align: left;">How do we measure sustainability</h4><p>So within the world of commercial sustainability, and carbon off setting, there is a basic concept to categorize things as scope 1-3 emissions.<br /></p><p></p><ol style="text-align: left;"><li>Scope 1 covers emissions from sources that a company owns or controls directly.</li><li>Scope 2 are emissions that a company causes indirectly and come from where the energy for the services it purchases and uses is produced.</li><li>Scope 3 encompasses everything else. So suppliers energy use, etc.</li></ol><div>The assumption is that raw energy consumption is not the issue. It is the generation of climate changing emissions to generate that energy that is the metric. <br /><br />This includes mining for minerals to build laptops and data centres, etc. But if you run your own green energy solar farm next to your data centre, and that is direct powering, without any significant battery storage. Plus feeding energy back to the grid, you can be pretty much carbon neutral. You can also fund renewable energy projects and offset. </div><div><br /></div><div>So strangely perhaps, given how full cooling can treble hardware life span. The biggest data centres in the world are currently <a href="https://datacentremagazine.com/articles/new-world-record-set-for-largest-solar-powered-data-centre">built in the world's desserts</a> rather than at the north pole. Solar and wind can be relied upon for more than 100% power.<br /><ul style="text-align: left;"><li><b>Microsoft</b> Azure was carbon neutral in <b>2012</b>. It is aiming for 2030 for its whole business (then for 2050 to removing all its carbon debt since it was founded in 1975)</li><li><b>Google</b> Cloud became carbon neutral for all its data centres in <b>2017</b>, also aiming for 2030 for all its business.<br /></li><li><b>Amazon</b> is aiming for AWS cloud neutral by <b>2025</b>, and as a global retail supplier to do the same for its whole business by 2040.</li></ul><div>Of course this is not possible in most European countries, so most carbon neutral data centres in Europe will be from purchasing carbon neutral generated energy, rather than actually being neutral in themselves. Although some go a long way down that road, to partner with <a href="https://www.stelliumdc.com/green/?network=g&fm_kwd=sustainable%20data%20centre">renewable energy suppliers</a> and tick a number of other sustainability boxes. The problem is if data centres are buying up lots of the renewable energy supply at a premium, then that means they are removing it from residential or other uses. So this is hardly helping global sustainability and in reality means they are far from neutral.</div></div><div><br /></div><div>Also carbon neutral means only that scope 1 is covered. Net zero is a standard above carbon neutral where to deal with scope 2 and 3, emissions must be taken out of the atmosphere. So that in practise only a net zero supplier is actually contributing nothing to climate change. No cloud provider <a href="https://www.bloomberg.com/news/articles/2022-11-17/hidden-emissions-from-cloud-computing-pose-net-zero-threat">is net zero</a>.</div><div><br /></div><div>A key point is that the latest enormous scale cloud provider data centres are not the main source of emissions, it is all the older, smaller, more local data centres and machine rooms of servers that are causing the majority of the emissions. In the same way that car pollution is disproportionately down to older vehicles. Of course there is the manufacturing footprint to consider for cars that can last 40 years, but all computer hardware has a much shorter lifespan of 3-5 years. Obsolescence makes increasing the lifespan uneconomic. Another green issue that could fill a blog post on its own.<br /><br />So moving to cloud provider's services and migrating any remaining on prem to the cloud is the sustainable thing to do, as long as what is moved is suited to cloud, or can be re-architected for the cloud.</div><p></p><h3 style="text-align: left;"></h3><h4 style="text-align: left;">What changes, as a developer, could improve sustainability?</h4><p style="font-size: medium; font-weight: 400;"><b>Life Style</b></p><div style="text-align: left;">So the obvious thing that people think of, is the nature of their employer's work. Or perhaps if your company is a B2B one, do they have green standards wrt. the clients that they work with. For example it may not make sense working for ExxonMobil as the company with the world's largest emissions. Perhaps the tech industry equivalent would be working on Crypto currency? But Blockchain developers are working on that reputation, even coming up with useful uses for it!, such as auditing sustainability usage for scope 2 and 3 verification. </div><p>Over half of internet traffic these days is video streaming, so stop watching Netflix and scrolling on TikTok - and read or listen to books instead is maybe a good behaviour 😉 <br />On the plus side porn has dropped from its high as 25% of internet traffic down to around 10%, but it has been more than replaced by cat and side hustle millionaire videos it seems. So if your side hustle is being a prolific social YouTuber, it may not be the most ecological of life choices. Since an hour long short story of digital text is 100 Kb whilst the same as a 4k video is a million times bigger at 10 Gb.</p><p>On a personal level, my previous employer was more office orientated. So it was keen to encourage people into the office with free food etc. it encouraged commuting in to work, and the maintenance of offices with permanent desk space for every employee, monitors, heating etc. and all the unnecessary extra emissions that entails. My current one is more remote-first.</p><p>In terms of remote work, having experienced pandemic lock downs in a city. I was going out for a regular cycle for exercise, so I can confirm that the reduction in emissions may have only been measured at 20% across the whole of Britain, but in the cities it felt more like 50% - the air was so much more breathable.Whilst maximising WFH is not equivalent to pandemic lock downs, it does make a difference. So changing jobs in the tech sector, to a full-time remote position, is certainly a worthwhile contribution to sustainability.</p><p>There is the argument that if we all lived alone in big drafty castles which could be turned off for the day by packing into an office a walk away. Then remote working is not more sustainable. But the reality of IT work today, especially with hybrid working, is that the big fairly empty building you are more likely to be in these days, is the office.<br /><br />So become full time remote if you can. If you have to work for an office based employer then choosing one that has hot desking, smaller offices, less frequent attendance and live within walking or cycling distance, are all part of being sustainable wrt. your tech job.</p><h4 style="text-align: left;"><b>Sustainability for a Cloud SaaS company</b></h4><p>I work for a company that produces a cloud market place software product, with most engineers working remotely and running no servers at all, just employees laptops, ie everything we run is via cloud providers services. We have a few offices globally but only a minority of engineers use them. Since all teams are largely remote, there is no office, no paperwork, commute or physical products.</p><p>So the same applies to all our other services, eg. from CI to presentations. From LaunchDarkly to our CRM. From expenses to online mental health support etc. Plus Slack and Zoom for comms.<br /><br />This is a pretty common model, your could call it a server-less company, it was the same at my previous employer. We sell SaaS and we use it for everything internally too</p><p>Therefore the assumption is that the problem of working out 2 and 3 should be solved by those cloud providers, which to some extent it is ... maybe some less than others. But emissions data can be obtained for scope 2 and 3 from them.</p><p>So that leaves scope 1. This may be hugely affected by how much face to face sales and marketing goes on etc, but that is not my area. So I am purely going to focus on what options are there to improve sustainability wrt. to the software architecture, development and deployment practises available for producing a cloud based software service, SaaS. Since those are the areas that as a software engineer I can influence.</p><p>So lets break that down to some basic elements, and work out what are the more sustainable practises and approaches.</p><p><b>Cloud vs. On Prem</b></p><p>So first things first. Is working for a company that runs everything on cloud, and delivers a cloud based product a good thing, versus writing software for running in a local server room or data centre?<br /><br />Assuming you use one of the big carbon neutral cloud providers and are using virtualisation to scale capacity efficiently with usage. Then it is likely that a Cloud data centre will be run much more sustainably than a local data centre where you may house your own servers and certainly a local machine room. So even if you are running a specialised HPC data-centre where the majority of traffic is local ... third party providers will be able to offer more sustainable options.<br /><br />Of course if you software is entirely unsuited to cloud virtualisation (k8s micro-services etc) or badly designed for it, you could actually be running up way more resources than a local monolithic solution on a few dedicated servers would. So sustainability goes all the way down through the architecture to the lines of code, and what they are written in.</p><p>A whole load of legacy software dumped onto the cloud can be less sustainable (and way more costly) to run than running it locally. </p><p>So another sustainable employment decision, is to not work for an organisation that either has a lot of legacy software or has its own servers or data-centres, or at least only if they are bigger than a soccer pitch (ie average DC size or bigger) and have their own adjacent wind farm or other local renewable power source.<br /></p><p>But if like my employer, everything is run on the three major cloud providers, and there is very little in the way of scope 2 and 3, then is the sustainable business box ticked already?</p><p>Unfortunately not, as mentioned, they are not net zero and ~2% of global emissions are from running data centres, so whilst that may be disproportionately from ones that are not the self powered giant DCs used by the big cloud vendors. Being as efficient as possible wrt. use of Cloud is still the key to being a sustainable tech worker. Especially with the projected growth in Cloud and its emissions being a significant ecological concern.</p><h4 style="text-align: left;">Choice of software languages</h4><div><br />So the reference paper often quoted (<a href="https://www.efinancialcareers.com/news/2023/06/which-programming-language-uses-the-most-energy">and misinterpreted</a>) for software language sustainability is <a href="https://greenlab.di.uminho.pt/wp-content/uploads/2017/09/paperSLE.pdf">this Portuguese University paper</a> on Energy Efficiency across Programming Languages.</div><div><br /></div><div>Where we could perhaps regard sustainability as the combined goal of minimising energy = performance, time and memory usage (table 5. in the paper). So leaving out the older / less mainstream languages we have ...<br /><ol style="text-align: left;"><li>C, Go</li><li>Rust, C++</li><li>Java, Lisp</li><li>C#</li><li>Dart, F#, PHP</li><li>Javascript, Ruby, Python</li><li>TypeScript, Erlang</li><li>Lua, JRuby, Perl</li></ol><div>So on that basis we should write everything in C or Go or possibly Rust, maybe even Java if we are not that eco-friendly.</div><div><br /></div><div>Whilst I do use Go for writing Cloud micro-services, I think the paper's focus on executing a few specific algorithmic performance tests is maybe not an entirely representative approach.<br /><br /></div><div>I have been a Python developer for 20 years and Python is ranked almost last for speed. 75 times slower than C at the top spot. But even if this were the case across all uses, then it ignores the fact that for compute heavy tasks where Python is employed in number crunching, it uses high performance libraries for the core processing functions. So Numpy is half C and runs all the big matrix manipulations in C.</div><div><br /></div><div>Hence the API coding and setup is in Python but it is not actually running everything 78 times slower than C, it is running maybe at worst, half the speed of a pure C program. Plus that custom pure C program could well have taken a lot longer to write and be less reusable, so in total use way more energy than a Python version would. Especially for short lived code and Jupyter interactive coding orientated use cases such as used in the science and finance sector. </div><div><br /></div><div>There are further optimisation approaches such as <a href="https://numba.pydata.org/">Numba</a>, when Python is being used for fast computational use cases which can compile straight to CUDA machine code for GPUs. </div><div><div>A paper comparing <a href="https://www.mdpi.com/2076-3417/10/23/8521">Java, Go & Python for IoT</a> decision making. Similarly puts Go at the top for efficiency, but places Python above Java (presumably Python was using SciKit hence C for performance critical algorithm execution). So clearly the use case and the methodology of the study, can make a huge difference in the measured efficiency.</div></div><div><br /></div><div>The same could probably be said for a number of the other languages languishing at the bottom of the table. If measured for executing a real world use case rather than a pure language implementation, the results can be much improved. <br />However for very nimble light weight micro-services then a directly compiled language like Go is going to use less resources than languages using JIT VMs and/or an interpreter. </div><br />Then there is the core point that most applications in the cloud are not highly intensive calculation based ones. The performance of the majority of applications are more likely to be due to the data I/O on the network between services and storage. Where raw algorithmic performance has little impact.<br /><br />What does matter is that running up parallel processes is simple and lightweight.<br />That core feature, along with the simplicity of Go and its small footprint were designed specifically for cloud computing. Which means, becoming a Go programmer, or at least learning it. Is a good choice for the more sustainable programmer.<br /><br />It is also why ML/Ops will often use Python at the development and testing stages of ML models, but then switch to Go implementations for production.<br /><br /><br /><h4 style="text-align: left;">Software Architecture</h4><br />The architecture that is deployed to cloud has a huge impact on the efficiency of a cloud service, and hence its sustainability. Certainly it is going to have much more impact on energy wastage than the raw algorithmic performance of the language used.<br /><br />The architectural nirvana of cloud services are that they are composed of many micro-services, each managing a discrete component of the service's functionality and each able to scale independently to provide a demand driven, auto-scaled service that ramps up and down whatever components are required from it at any given time. Morphing itself to provide always sufficient capacity. Not needing stacks of wasteful hot failover servers running without a job to do. Not getting overloaded at peak and failing to deliver on uptime. <br />The ideal sustainable use of hardware, always just enough. Virtualisation allowing millions of services to ramp up and down across the shared Cloud provider DCs vast hardware farms.<br /><br />Clearly, combined with Big Cloud using the latest carbon neutral DCs, this ideal is much more sustainable than each company running its own servers and machine rooms 24/7 on standard grid non-renewable power, for a geo-local service that only approaches full capacity twice a day, and could probably be happily turned off 6 hours a night with nobody noticing.</div><div><br /></div><h4><div style="text-align: left;"><span style="font-weight: 400;"><span style="color: #38761d;">From this perspective, one the big cloud vendors are keen to promote, Cloud is the sustainable solution not the problem.</span></span></div></h4><p>Unfortunately that ideal is often very far from the reality.</p><p>Software that is essentially monolithic in design can end up being lifted and shifted to the cloud with little refactoring. At best the application is maybe chopped up into a few main macro-services. UI, a couple of backend services and data store as another. Then some work done to allow each to be deployed to Kubernetes as pods with 3 or more instances in production. Ideally the replicas are identical in role and have good load balancing implemented, or multi-master replication for the storage. But often primary-replica is as good as it gets.<br /><br />Essentially an old redundant physical server installation with a few big boxes to run things is being re-implemented via k8s. Then repeat that per customer, usage domain, geo-zone or whatever sharding is preferred. Big customers get big instance's - the providers have wide sizing ranges for compute, storage etc.<br /></p><p>Its better than just setting up a VM to replace each of your customer's on prem boxes - and basically changing almost nothing from on prem installs, but any increased sustainability is only that provided by the Cloud vendor's DCs. The solution is not cloud scale with auto-scaling, its repeated enterprise scale with a lot of fixed capacity in there.</p><p>For these cases maybe consider swapping out some elements with a cloud provider scaled service, eg the storage. Whether that is by using the Cloud provider's solution or a third party vendor's market place one.</p><p>Even for software that has been freshly written for the cloud there can be architectures that consume excessive resources and are overly complex, some times because of the opposite issue. So with the budget to rewrite for cloud, then developers can leap too fast for all cloud scale solutions - when the service has no need of them. For example deploying multi-region Kafka for event streaming and storage, when data could happily have been sharded regionally and put into a small Postgres or MariaDB cluster. </p><p>Repeatedly firing up a 'micro-service' k8s job that is very short lived but uses a big fat monolith code base, so that 80% of the time and cost of the job is in the startup. This is where language matters more, the lighter and faster the language, the smaller the binary and its memory usage, the better.</p><p>The use of gRPC between micro-services provides 10 times the speed of REST, which can be reserved just for the whole service API to the UI and CLI.</p><p>One key indicator of waste is the obvious one of cost. If your new cloud deployed application is generating usage costs that work out far more expensive than the TCO for its on prem deployment. Then its architecture is not fit for cloud use. You are burning money and generating excess C02.<br /><br />Sadly with architecture it all depends what suits the scale and use cases of a service. So there is no simple fix it advice here.</p><h4 style="text-align: left;">Development, Testing & Release practises</h4><div><br />Testing and release are probably the most important area of Cloud software development that could benefit from more sustainable practises. This is perhaps more a pitfall of the rise of Docker and infrastructure as code, rather than Cloud itself, but the promise of replicable automated built software environments has delivered. </div><div><br /></div><div>What it has delivered is a development to production life cycle where developers can spin up any number of their own development environments - even one per Jira ticket, automatically built on its creation perhaps. <br />In order to get merged with the release code your team choose to run the full E2E suite. It takes a little while, but we can speed it up by running the 5 clusters we need in parallel for each test environment case. These also standup the whole environment, load it with fixture data and run E2E tests on it, maybe some infrastructure ones too, that failover the storage and restore from backup.<br />But at least they should automatically teardown the test clusters at the end, where as dev clusters can hang around for months without cleanup.<br />Then once it passes it goes out to the dev environment which has its own testing barriers for release to staging. Staging should be the same hardware resourcing as Prod so that it properly tests that it is working for it, perhaps with some load testing or maybe that is done in another set of clusters.<br />Finally it gets to roll out to production, but maybe for safety prod has a set of canary environments it goes to first, for final validation before it can be rolled out to customers.<br /><br />So to get 20 lines of code into production. We could easily have a process like the above, that involves spinning up over 10 temporary k8s clusters and uses hundreds of longer life ones. Just running the E2e and infra tests will take over an hour.</div><div><br /></div><div>This is seen as good practise in the Cloud world. Rigorous testing before release to production. It is pretty common for companies producing a cloud service. Since most software companies now have to produce a cloud version of their product to satisfy the markets then that is a lot of companies. For the first year or so, all this will be run at the cost of millions of dollars, with hardly any customers using it. Because that is what you do. Agile, get the product out, then grow and refine it and the team developing it. Build it and the customers will come.</div><div><br /></div><div>This is a hugely wasteful process, and it is not far from Crypto in terms of generating emissions, for something that has no practical use yet.</div><div><br /></div><div>If we do end up with a lot of customers fine, but for services that are not multi-customer architecture, ie big revenue small customer numbers, there may well be customer specific customisations of the product ❄❄snowflake alert❄❄ So the easy option is the duplication of as many clusters in dev and staging as are in prod, to cater for fully testing for those big clients. So a great deal of duplicate resource spend.</div><div><br /></div><div>So there should be a lot more consideration of sustainability when establishing the above practises for the development to release cycle.</div><div><br /></div><div>One way to address this issue is to push as much testing as possible down the testing pyramid.</div><div>Unit testing is less useful for cloud since the whole point of Cloud and micro-services is to do only one thing and knit together via API calls the full service. Which means there may be very little functionality that can be tested by a unit test, since everything needs to be mocked.</div><div><br /></div><div>However that doesn't mean that things cannot be faked, fakes allow fast functional testing of micro-services. Fakes can mean the full emulator's of services, eg. <a href="https://cloud.google.com/pubsub/docs/emulator">Google pub sub</a>. Or running your gRPC services over its test fake, local memory, <a href="https://pkg.go.dev/google.golang.org/grpc/test/bufconn">bufconn</a>.<br /><br />But the aim should be to establish a full fake test framework that can run up your service on your laptop. Ideally without the need of a k8s fake like <a href="https://kind.sigs.k8s.io/">kind</a> to stand it up. Since we don't want to fake the deployment environment - just the running code. Functional tests can then be written that can be used like unit tests to check PRs pass in seconds as part of a git workflow. Running those same tests at regular intervals against full deployments can validate that they correctly mimic them.<br /><br />There should be layers of tests that validate code before the E2E test layer and do not just have unit and E2E, since then the validity of the code relies on full deployment testing. Full deployment testing should just be run as part of the release process, it should never be run at the PR validation level, it takes too much time and energy.<br /><br />Developers can have reasonably long lived personal dev clusters not one per PR, maybe even resort to shared dev clusters per team, to reduce spinning excessive amounts of cloud resource for development.<br />Automated shutdown based on inactivity should be the norm.</div><div><br /></div><div>Time should be invested in developing good sample data for non production environments. They should not resort to duplicating all customers, regions or whatever sharding. Plus a bunch of test versions of them. If you have more things running in dev than in prod, you are doing things wrong.</div><div><br /></div><div>Another route to take is to only have production for long lived deployed clusters. With temporary clusters for automated testing and the use of feature flags to cater for final stage testing in production sandboxed feature enabled clusters, prior to full release. So this separates deployment from release - the latter can then be moved outside of engineering, once a flag has passed testing and validation.<br /></div><div><br /></div><div>Temporary clusters can use tools such as <a href="https://www.vcluster.com/">vcluster</a> for automated short lifespan k8s clusters, significantly reducing the resource usage and speeding up the spin up time, for dev clusters. Hundreds of pseudo separate k8s clusters for dev and testing can be run in a single k8s cluster.</div><div><br /></div><h4 style="text-align: left;">Anything else?</h4><p></p><p>The explosion in data is not just all video streaming. Observability is a huge topic, the amounts of telemetry and logging that a well SRE engineered service needs can be overwhelming. Clear management of that, and limits on retention (at least outside of cold - ie tape / optical - storage) are essential. Such things as the ability to turn on higher info debugging levels for very restricted sets of environments. Provide valid ML learning data sets without filling up data lakes of hot storage, etc.<br /><br />There are still so many more things that impact Cloud sustainability in terms of Cloud applications ... however this blog post is already unsustainably long 😀. So I think I should end it here.<br /><br />The main point is Cloud can be the sustainable option, but only if cloud engineers put in the effort to make it so, by pushing for the most sustainable architecture, development and release practises in our every day work.</p><p></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p>Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com1tag:blogger.com,1999:blog-6603837339236629698.post-27929198111075753722023-05-13T15:31:00.007+01:002023-05-19T07:58:47.317+01:00Ten Software Engineering Managers<h2 style="text-align: left;">Engineering management from the perspective of the managed.</h2><div><span style="font-family: inherit;">I have worked in the software industry for many years. Working in both the public and private sectors as an individual contributor software developer, SRE and cloud engineer. <br />Along the way I have been managed by 15 different managers, along with having work interactions with around another 100. So naming no names I thought maybe it is worth distilling my managed life into a set of software manager <span style="background-color: white; color: #202124;">caricatures</span>.</span></div><div>To illustrate what makes a good software manager (and a bad one).<br /><br /></div><div>I accept that since I have never chosen to become one, then I am criticising a skill I have shown no interest in acquiring myself. I have only partially dabbled in it, via senior IC roles, ie Staff Engineer technical advocacy. However it is a good chance to let off steam ... and possibly a software manager may read this and reflect on which characteristics they may want to work on. So guys its time for your 360!</div><div><br /></div><h3 style="text-align: left;">Remote managers are better</h3><div>Having worked for many years in offices with my manager sat at the desk behind me, literally looking over my shoulder. I should probably also admit that in recent years I have chosen to work full time remote, ideally with my manager in another time zone, or at least another country. Luckily I got on with those managers that were literally breathing down my neck. But it certainly didn't help me get the job done.</div><div><br /></div><div>Full time remote is probably not so good an option for those who are just starting their career. However for established engineers it does tend to insulate you from the various forms of toxic management out there and lets you get on with the job. It also requires you to be more explicit about the engineering process, collaboration and documentation, and hence be more productive in a more maintainable manner.</div><div><br /></div><div>Bullying, micro-management, under-mining, overly personal etc. Although these can all still happen in front of colleagues on Slack and Zoom. But it is easier to shut down a conversation and walk away virtually so that the manager has time to calm down and control their behaviour. </div><div><br /></div><div>Manager's skilled at managing remote international teams have to be skilled at targeted succinct and effective communication. Especially if they only have a few hours in the day when the team members time zones overlap. </div><div><br /></div><div>So on average I have found remote managers to be better managers. Although that may just be that the UK is renowned for bad management - both anecdotally and in terms of the UK's productivity ranking. So managers that are from more productive countries than the UK are likely to be from a better management culture - hence better managers.</div><div><br /></div><h3 style="text-align: left;">Peer level managers are better</h3><div>In a traditional hierarchical organisation. The CTO is the most senior manager and so remunerates people more who are managers like themselves. So particularly in the public sector there is an unwritten rule that even the most junior manager must be paid more than the most senior engineer.<br /><br />This approach naturally leads to some rancour amongst senior technical staff who want to stay technical. It also devalues technical skills. Since to increase their salary technical staff must eventually give up all their years of technical skills and somehow gain 10 years of skill in people management overnight. Of course this doesn't happen so instead you get very novice people managers with a lot of largely irrelevant technical skills, and perhaps a personality totally unsuited to enabling their team.</div><div><br /></div><div>It is easy to filter out such an organisation. Just get on one of the company feedback sites, eg. Glassdoor, Fishbowl etc. Check what the top IC engineers salary range is and check that it is higher than the junior management range. Ideally you should expect the top IC grade, eg. senior technical architect to be paid around 50% more than an engineering manager. But take it as a red flag if there is no technical grade that is paid more than any management grade.</div><div><br /></div><div>Because it means the organisation doesn't value your engineering skills, and you are likely to be managed by someone who doesn't value them either and may regard themselves as your superior. So why would you work there? Surely better to work for a manager who treats you as a peer in an organisation that values your skillset.</div><div><br /></div><h3 style="text-align: left;">Most people quit their job because they have a bad manager</h3><div>The surveys tell us 43% leave their job because of a bad manager. With the next most important reason being general toxic culture / under appreciation. Whilst pay and progression comes in third.</div><div>We all want to get the number 1 manager, but unfortunately we often end up with 10.</div><div><div>So most people leave their jobs because of getting one of the bottom ranked manager types.</div><div><br /></div><div>HR's job is to minimize disruption to the company, they will tend to take the side of the more senior employee unless that employee's behaviour is clearly proven to be detrimental on the wider scale to the organisation.</div><div><br />A few junior employees reporting them for incompetence or abusive behaviour, does not usually qualify.</div><div>So if you do complain then it is unlikely to improve the situation, and may well make it worse if the manager is informed that you complained about them. Sadly I have not personally heard of any case where complaints about bad management resulted in resolving a problem, but definitely have heard of cases where it made it a lot worse for the remaining time before the complainant's notice period was done.<br /><br />So unless a more senior member of your organisation is on your side, and decides to deal with the problem manager. It is best to leave your job if you have a bad manager. For a big company that may just mean changing departments. But leaving your job is safest in terms of removing yourself from a toxic environment, plus you can honestly report the manager as being the reason for leaving in your exit survey. It is a chance for you to help your ex-employer, you shouldn't expect them to act upon it immediately. But given a sufficiently high attrition rate for a manager's team, the higher level managers should hopefully have enough sense and self interest to deal with their failing colleague.</div></div><div><br /></div><h2 style="text-align: left;">Ten Software Managers</h2><div><b><span style="color: #38761d;">1. The Enabler </span></b></div><div><br /></div><div>The ideal technical manager is the enabler. They have better personal skills than software skills.<br />They are most likely to either have never been an engineer, ie they entered the industry as a professional technical manager. Or they were an engineer some years ago but are more interested in the people than the code so changed direction fairly early into their career.<br />They will be well aware of the wider environment of the organisation and the stakeholders and drivers for the technical work. A great communicator. With knowledge of all the systems and processes and so how to unblock things and get stuff done procedurally. Plus be a talented scrum master.<br /> </div><div><i>Most likely to say:</i> How should we fix that process for you?</div><div><i>Catchphrase:</i> Thanks for your contribution 😊</div><div><br /></div><div><div><b>2. The Big Cheese</b></div></div><div><br /></div><div>You may find at some point in your career you get a manager who is actually a much more senior person in the organisation than their surface role as an engineering team manager. Maybe they founded the whole company, or they are the head of large department with other non-software engineering work.</div><div><br /></div><div>To be high in the organisation they are likely to be a better manager than the average manager you might expect, with great people skills.</div><div><br /></div><div>(Obviously this rule may not hold once you get to CEO level, I hope nobody who ended up with Elon Musk as their direct line manager is reading this ... although I don't think he was interested enough in software to directly manage software engineers, given he gave up coding at 20 before getting any formal training in it.)<br /><br />They are likely to be more of a leader than a manager, but likely to be particularly good at fostering the development of their engineers. They are also going to have all the contacts to unblock any issue that may arise, plus have their finger on the pulse of high level changes and strategy that may impact your job.</div><div><br /></div><div>They may still have a surprising high level of technical understanding of the company's software, but as a senior manager they also understand that their job is all about enabling an even higher level of understanding, and technical decision making about it, in their engineers.</div><div><br /></div><div>On the down side, they probably don't actually have that much time to devote to you personally, so don't be surprised if they fail to act on things you have suggested to them. If there is any issue they make damn sure somebody in the team takes responsibility for sorting it out. So be prepared to be volunteered by them.</div><div><br /></div><div><i>Most likely to say:</i> How are you aiming to progress in the company?</div><div><i>Catchphrase:</i> Isn't it amazing what we are building at (fill in organisation name)? 🏆<br /></div><div><br /></div><div><b>3. The Bro </b></div><div><br /></div><div>They were happily doing their engineering job when the manager for their team left. They were not necessarily the most technical member of the team, but they were the one who got on with everyone and the one that everyone in the team was happy enough to have as a manager. So they took the job.</div><div><br /></div><div>They want to be you best friend and genuinely aim to protect you as a member of their team from problems or issues that are coming down from higher up the management hierarchy.</div><div><br /></div><div>They are reasonably technically aware and skilled, but don't try to make the technical decisions or deal with issues unless nobody in the team steps up to the plate, then they take the task on themselves.</div><div><br /></div><div>They are just one of the gang, but your manager too.</div><div><br /></div><div><i>Most likely to say</i>: Let me buy you a beer ... umm, sorry, everyone in the team has been made redundant.</div><div><i>Catchphrase</i>: Yeah, what the hell are management up to 🍻</div><div><div><div><br /></div></div></div><div><b>4. The Middle Manager</b></div><div><b><br /></b></div><div><div>They used to be a techie, many years ago, but they weren't really interested in tech and hence were probably pretty mediocre at their job. But thankfully they got on the management track, result! They love the politics and intrigue of management way more than technical details. They have good people skills but find it hard to hide the fact they have absolutely no emotional investment or interest in technology. <br /><br />They literally couldn't care less if the head of the organisation declared that from this day forth all software in the company will be written in Cobol and all open source was banned from use. If that is what their boss says, then their job is to listen to their team moan and complain about it. Then tell them that is what they will be using. Since any technical objections the team gives are meaningless to them.</div><div><br /></div><div>On the plus side they do not micro-manage and they appreciate their team members skills, and are good at bringing those skills out.</div><div><br />The middle manager likes people and wants to please them. But knows that their job is only to please their superiors. From their team they just need compliance, and being good at their job is all about bringing their team onside, with whatever the higher ups require. However outlandish.</div><div><br /></div><div><i>Most likely to say</i>: Sorry it wasn't my choice, but come on, we need to get on with it.</div><div><i>Catchphrase</i>: I raised your issue with senior management, but no go. 😢</div><div><br /></div><div><b><span style="color: #ffa400;">5. The Invisible Man</span></b></div><div><br /></div></div><div>The invisible man works in a big organisation and knows the value of his super power. He used to do a bit of useful work, but that was years ago, when he still took enough interest in his job to get to the bottom rung of the management ladder. But over the years he realised he could get away with doing less and less actual work. He mastered quiet quitting years ago, before it became a thing. </div><div><br /></div><div>Since then he (and his boss) has worked out that if he gets a team of reasonably senior self starter engineers, then they don't actually like or want to be managed. So a difficult opinionated team for some managers, are actually perfect for the invisible man. Ideally if they are a distributed team, then he can "manage" them remotely, with the minimum level of work. Send the odd email maybe, do one or two Zoom calls a week and his work is done.</div><div><br /></div><div>His team may not respect him, they may even play jokes on him. But he so doesn't give a crap about work that he won't even notice them doing it. As a manager the invisible man is the mid-point of the top 10.</div><div>He is neither a good or bad manager, he is like having no manager at all. He will never support, challenge or rebuke you. At least nobody every quits their job because they have the invisible man as their manager.</div><div><br /></div><div><i>Most likely to say</i>: Sorry just had to step out for a minute.</div><div><i>Catchphrase</i>: Keep up the good work guys. 👻</div><div><br /></div><div><b>6. The Over Employed</b></div><div><br /></div><div>The over employed is a people pleaser. They like to say yes to everyone, including their managers and their team. They like to be seen dashing about doing things. So sure they will do that for you tomorrow ... but tomorrow never comes! Because the over employed is too busy. They may even have got themselves a second job on the sly, thinking they can juggle both at once. They are such an optimist, of course it will work out. </div><div><br /></div><div>They carry on saying yes to all those tasks that you need them to get done, to unblock your work. Just as they do to everybody else. So sure, they will sort out your performance review. Talk to the other manager that you need info from. But somehow they never seem able to deliver on time, if at all. <br /><br /></div><div>They will be there at 3am the night before a major deadline, chucking something together that is not quite finished and misses some vital component. But it is good enough, should do what is needed.<br /><br /></div><div>Poor people pleaser working their arse off to please people, so why is nobody that pleased with them?<br />But they stay cheerful, not going to let those moaners drag them down.<br />Oh well if colleagues get too annoying they can always bail out. Get a job somewhere else and leave behind all those trivial little tasks that people kept bugging them with.</div><div><br /></div><div><div><i>Most likely to say</i>: Yes sure, I will do that.</div><div><i>Catchphrase</i>: Hiya guys, why the long faces? 🤯</div><div><br /></div></div><div><b><br /></b></div><div><div><b>7. The Team Player</b></div></div><div><b><br /></b></div><div>For the engineers in their team the Team Player is one of the best bosses they have had. They always have their back, supports them and persuades those above him to funnel more resources and authority to their team.</div><div><br /></div><div>They are ambitious and aiming to rise higher up the management tree, but loyal to their guys. They know their team is really the only one in engineering that is run properly. It is also the one doing the work that matters. They makes sure they sweet talks those above them and dedicates a reasonable portion of their time to making sure they know that they and their team are the keystone keeping the organisation running.</div><div><br /></div><div>They know that anybody looking to advance their career should be spending a good portion of each working day working on their own personal advancement. Don't be the fool who spends all day every day just doing the organisation's work.</div><div><br /></div><div>Unfortunately their highly competitive nature and self belief, can lead to self delusion. They start to believe their own self promotional narrative. This leads them to be contemptuous of those annoying flakes in other teams who are not doing anything of real value. Though generally energetic and positive those outside their circle and below their grade, get the aggressive, bullying, dismissive and unpleasant side.<br /><br />This behaviour also distorts the true importance and funding that the organisation should be devoting to their team's remit, to the detriment of other areas. So can cause problems for the company as a whole, as well as for morale outside of their team.</div><div><br /></div><div><i>Most likely to say</i>: My team has got that, we will save the day.</div><div><i>Catchphrase</i>: Get out of my way, unlike you, I have a real job to do. 💪</div><div><br /></div><div><b><br /></b></div><div><div><b>8. The Bureaucrat</b></div></div><div><b><br /></b></div><div>Once upon a time, long long ago. Some software companies decided they wanted to sell their wares to the ancient hierarchical institutions of government and learning. Those institutions believed in traditions and rules and processes and making things quantifiable. So something that did that for software management was perfect. So the companies came up with traditional names signifying regal wealth and power - Prince. Along with naming their software, Enterprise software, signifying new, wealth generating and boldly taking the initiative. That was in the 1980s.</div><div><br /></div><div>It was the perfect sales pitch for these outdated institutions and they bought into it wholesale. Although it took them about 25 years to get around to it, old institutions are like that. <br /><br />So their procurement processes and software and its life cycle and management were bound into reams and reams of bureaucratic processes. The IT managers in those institutions were groomed in the ways of Prince 2 and ERP and ITIL and all the rest of the snake oil the companies had invented. They devoted all their money and time to the training and meetings and processes around it. <br /><br />As far as the engineers in those institutions were concerned, a few of the processes were useful but that was far out weighed by the whole bureaucratic burden and costs they were wrapped in.</div><div><br /></div><div>The manager spoke at length for years at far too many meetings of the process and the newly procured systems, but unfortunately the quality and features of the institutions systems seemed to have gotten worse. Whilst the cost of them became much much greater.<br />The institution employed more and more managers, although they just managed projects not people. But eventually there was so many, they needed managers for them too.<br />But they hired no more engineers.<br />Eventually the institution decided it didn't really need any in house software engineers at all, why were they writing software when they should be buying proper Enterprise software from the companies?<br /><br />So the engineer realised they had to leave the kingdom of the Prince 2 and its manager and go to live in a different place altogether, where their business was making software. Strangely in the software republic they had never even heard of Prince 2. They vaguely knew of PMP, but nobody in their right mind would use it to make or manage software. </div><div><br /></div><div><i>Most likely to say</i>: I am sorry I cannot talk to you, until you have filed a change request.</div><div><i>Catchphrase</i>: Our KPIs show that we are on course for all our CSFs 👑</div><div><br /></div><div><b><br /></b></div><div><div><b>9. The Technical Superior</b></div></div><div><b><br /></b></div><div>The technical superior is at least a grade or two above you and always interacts with you as your superior.</div><div>They were an engineer and still secretly preferred being an engineer to their current job. They were never the best technical engineer in a team, so they compensated for that by imagining they were the best at seeing the big picture engineering wise and still are. So they decided to become a manager to make sure the right technical decisions are made.</div><div> <br />They probably preferred their old job because they didn't have to spend so much time on politics and relating to people. They have been a full time manager for at least 5 years and their technical knowledge and judgement have dated badly. However they still see themselves as the person most qualified to make technical decisions. The more senior they become, the more out of date their technical knowledge becomes, yet the bigger and more expensive are the technical choices they make for their organisation. </div><div><br /></div><div><i>Most likely to say</i>: Never ask a bunch of developers to decide on technology, ask 5 and you get 5 different answers.<br /><i>Catchphrase</i>: We really need a big monolithic Java XML SOAP web service to do that. 🙈</div><div><br /></div><div><br /></div><div><b><span style="color: red;">10. The Rockstar Techie</span></b></div><div><br /></div><div>The very worst engineering manager is the rockstar techie.</div><div><br /></div><div>Rockstar techies have great technical skills and may have technically saved the day a few times for senior management. So their lack of personal skills are overlooked, and they tend to behave better to people above them in the management hierarchy anyway.<br /><br />But in the long term they are damaging to the quality of your codebase, the more senior they become the more damage they do. So the common technical issues they can cause are blocking devolution of architectural decisions and diverse input into them. Possessiveness over code, or technical knowledge. Wanting to be the saviour for technical problems, outages etc. <br /><br /></div><div>However the damage they do to the code is minor compared to the huge damage they do to the engineering team and culture if they are put in a management position.</div><div><br /></div><div>They were often the most senior technical engineer in their part of the company. They have probably been there a while and to justify getting another pay rise they got lumped with doing management too. They regard management as an annoying burden tacked on to the side of their real engineering job. They aim to remain the lead engineer in the team and make all the final technical decisions. They cannot devolve technical authority and have no interest in picking up any management skills. So are most likely to exhibit basic level fails in terms of interpersonal skills, have a technical superiority complex, be rude, moody, bullying, micro manage etc.</div><div> <br />As time goes on they must devote more and more time to management and yet cannot accept no longer being the most technically adept guy in the room. A paradox that can only be solved by driving away any members of staff who challenge them technically and effectively dumbing down the technical skills of the whole team.</div><div><br /></div><div><i>Most like to say</i>: You Fffing broke it you moron, when coming across a feature change implemented in a way they didn't expect.</div><div><i>Catchphrase</i>: You are wrong, this is how it must be done, idiot 👹</div><div><br /></div><div><br />(Any resemblance to persons living or dead is purely coincidental)</div>Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-73140469261777884292023-01-27T14:34:00.014+00:002023-01-31T10:11:40.833+00:00Tech sector lay offs<p><span color="var(--color-text)" style="font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal);"><i>INTRO: Having failed to post any technical articles for a few years I feel that my blog is at risk of dying from lack of attention. So to avoid that, I have decided to mix up its content a bit and diversify from long Technical HOWTOs to more casual short posts whose tech content may vary (or not exist at all) ... so to kick off this one is a short rant about tech news!</i></span></p><h3 style="text-align: left;"><span color="var(--color-text)" style="font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal);">Fake News about Tech Industry Collapse</span></h3><p><span color="var(--color-text)" style="font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal);">A number of bigger tech companies are laying off staff at the moment. The press reports this as a related to them making terrible errors in misreading trends post pandemic and suddenly hiring way more staff than they normally would over the last year, 2022. Now reality has hit and the tech sector is awash with newly redundant workers as big tech desperately tightens its belt to survive. But is there any truth in either the premise of this argument or its conclusions?</span></p><p><span color="var(--color-text)" style="font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal);">A random recent example that repeats these ubiquitous assumptions ... <a href="https://fortune.com/2023/01/23/big-tech-layoffs-15-20-percent-next-six-months-top-analyst-says/">https://fortune.com/2023/01/23/big-tech-layoffs-15-20-percent-next-six-months-top-analyst-says/</a> </span><span style="font-size: inherit;">... Citing the not for profit basket case, Twitter, being turned into a uniquely loss making zombie company by Elon Musk as though Google, Amazon and Microsoft had something in common with it as a business!</span></p><p><span color="var(--color-text)" style="font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal);">If you look at the graph of the employee count of these companies year on year. Then only Microsoft hired more than usual last year, 2022, and Amazon to cope with its pandemic boom did so in 2020. Google last had an uncharacteristic hiring spree in 2012. Similarly Facebook and other big tech companies growth curve exactly followed that of the last decade in recent years. There was no extra hiring.</span></p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);">So why the layoffs. Nothing to do with over hiring. Simple - look at the share price curves instead. <br />The market is valuing most big tech lower and recession looms - they need to chop staff to chop costs and make their finances look better to reduce that drop for their shareholders - the biggest of whom are the CEOs of those companies. <br /><br /></p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);">The big companies are still making billions in profit (they are not loss makers) so over the long term it would cost them less to retain talent, and they can afford to. However the CEO's personal short term loss in wealth is something that they can't stomach, and it is a good excuse for a clear out.</p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);"><span color="var(--color-text)" style="font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal);"><br /></span></p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);"><span color="var(--color-text)" style="font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal);">Obviously there are large loss makers, the most prominent of which is Twitter, but they are special cases with failing business models - Twitter only ever made a profit in the run up to Trump's election as it became a huge engine churning up ideological conflict with political and conspiracy fictions. Without a politically polarised USA to drive an explosion of lies and social media wars, it was always a loss maker. It has nothing to do with big tech trends. </span></p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);"><br /></p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);">Apple is a real exception to the trend, currently, but only because its share price hasn't dropped significantly yet. Hence it hasn't done its layoffs yet.<br /><br /></p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);">So the real reason big profitable tech is laying people off is a temporary fix to save a few billions from the bear market's current swipe at the personal wealth of their CEOs. Even though being a slave to short term market driven fire / rehire cycles will cost the company more in the long term. It is purely a personal choice to save personal wealth, there was no over hiring, there is no need for redundancy, there is no down turn in the growth demand for the tech sector, there is just less cheap loans around to fund it.</p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);"> <br />The jump in the cost of borrowing is due to what looks to be a short term hike in interest rates, nothing to do with the tech sector itself. Largely due to Putin starting a war of revenge against his long dead ghosts from 35 years ago, when the Soviet Union fell as he presided over the KGB. A war against ghosts can never be won ... but unfortunately its far more terrible human cost will carry on for years.</p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);"><br /></p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);"><span color="var(--color-text)" style="font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal);">This becomes clear on the other side of the market coin, late stage startups are often making no profit at all - because all profits along with loans are ploughed into growth. Because they must never be seen to be shrinking - to keep growing their valuation for IPO.</span></p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);"><span color="var(--color-text)" style="font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal);"><br /></span></p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);">So a lot of software companies are hiring even though strictly they don't need to, whilst the big boys are firing when they don't need to.</p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);"><br /></p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);">Meanwhile the number of software jobs as a whole keeps growing at 10% year on year. The demand continues to outstrip supply and wage inflation follows. So if you are a talented (if overpaid) software engineer ... I wouldn't worry too much about the layoffs. It is just a chance to take a redundancy bonus and a 20% pay rise to try something new. Unfortunately for those that were used as a disposable foreign human resource by big tech, via job dependent visas, it is <a href="https://www.linkedin.com/feed/update/urn:li:activity:7025204309470388224/?origin=SHARED_BY_YOUR_NETWORK">a different story</a>. They may well not have time to stop their CEO's thoughtless greed needlessly disrupting their lives.<br /><br />Many predictions are that interest rates will drop and the bear market end in a year's time. At which point the mass layoffs will be reversed. But the big tech companies will have lost a lot of money, and a great deal more trust, by following the market and each other so closely. The lesson employees will have learnt is that loyalty to such companies will never be rewarded or returned.</p><p style="--artdeco-reset-typography_getfontsize: 1.6rem; --artdeco-reset-typography_getlineheight: 1.5; border: var(--artdeco-reset-base-border-zero); box-sizing: inherit; color: var(--color-text); counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; cursor: text; font-size: inherit; font-weight: var(--artdeco-reset-typography-font-weight-normal); line-height: var(--artdeco-reset-typography_getLineHeight); margin: 0px; padding: 0px; vertical-align: var(--artdeco-reset-base-vertical-align-baseline);"><br style="background-color: white; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); font-family: -apple-system, system-ui, "system-ui", "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif; font-size: 16px; white-space: pre-wrap;" /></p>Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com1tag:blogger.com,1999:blog-6603837339236629698.post-82259903190183187312020-02-13T17:41:00.003+00:002020-02-14T10:06:34.714+00:00K8s Golang testing - Mocks, Fakes & Emulators<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: "arial" , "helvetica" , sans-serif;">A lot of the Go code I write is developed against Google's Kubernetes API.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">The API is fairly large and given that the code is mostly calling K8s then it inherently has a set of complex dependencies, these dependencies have time and costs associated to run up for real as K8s clusters in cloud providers data centres.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So how can we test K8s Go code ... or any Go code with significant dependencies. We must use substitute objects that simulate the dependency objects. There are three common terms for these substitutes known collectively as test doubles. They are mocks, stubs and fakes. Unfortunately these terms are all pretty <a href="https://en.wikipedia.org/wiki/Mock_object">interchangeable</a>. So before I start bandying them about, I had better define what I mean by these and related terms for this blog post ...</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Stub</b> = </span><span style="font-family: "arial" , "helvetica" , sans-serif;">A function, method or routine that doesn't do anything other than provide a shim interface. If a stub returns values then they are dummy values (possibly dependent on calling args, or call sequence) that are either fixed or generated for a fixed range.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Mock</b> = An object which replicates all or part of the interface of another object, using stub methods.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Fake</b> = An object that </span><span style="font-family: "arial" , "helvetica" , sans-serif;">replicates </span><span style="font-family: "arial" , "helvetica" , sans-serif;">all or part of the interface of another object, and has methods which are not all stubs, so some methods perform actions that simulate the actions of the real object's method.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span><span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Emulator (full fake)</b> = A package that has a significant amount of faked methods (rather than stubbed ones). For example a database server will normally provide an in memory database configuration that will completely replicate the core functionality of the database but not persist anything after the test suite is torn down.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Normally an emulator is not part of the code base and may require service setup and teardown via the test harness. As such, use of emulators tends to be for integration tests rather than unit tests.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Spy</b> = A stub, mock or fake that records any calling arguments made to it. This allows for subsequent test assertions to be made about the recorded calls.</span><br />
<h2 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">K8s Go Unit Testing</span></h2>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Given that unit testing by definition should not need any dependencies, then the assumption might be that for dependency heavy code, most unit tests will require test doubles ... the follow on assumption is that </span><span style="font-family: "arial" , "helvetica" , sans-serif;">double == mock.</span><br />
<h4 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Mocks</span></h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Hence a standard approach to this is to use one of the Go's many generic mocking libraries, such as <a href="https://github.com/vektra/mockery">https://github.com/vektra/mockery</a>, <a href="https://github.com/stretchr/testify">https://github.com/stretchr/testify</a> or <a href="https://github.com/golang/mock">https://github.com/golang/mock</a>.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">There are numerous tutorials and explanations available to get Gophers started with them, for example this walk through of <a href="https://blog.codecentric.de/2019/07/gomock-vs-testify/">testify and mock</a>.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">These mock tools all offer test generation from introspection of your API calls to the dependency, to reduce the maintenance overhead.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So for everything we write tests for we generate mocks that reflect our codes API and confirm that it works as we expect. However there is a problem with this, the problem is described in detail in this blog post by <a href="https://sendgrid.com/blog/when-writing-unit-tests-dont-use-mocks/">Seth Ammons</a> or in summary by <a href="https://testing.googleblog.com/2013/05/testing-on-toilet-dont-overuse-mocks.html">Testing on the Toilet</a></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The issues with mocks are:</span><br />
<ol style="text-align: left;">
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">Mocking your code's calls to an API only models your usage and assumptions about it, it doesn't model the dependency directly. It makes your tests more brittle and subject to change when you update the code.</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">Mocks have no knowledge of the dependencies they are mocking so for example as Google's APIs change - your real code will fail, but your mocked tests will still pass.</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">Mocks may use call sequenced response values, so making them procedurally fragile, ie changing the order of your test calls can break mocked tests.</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">If you want to swap out one library with another for a component, then because your mocks are specifically validating that libraries API, your mocks of it will need to be regenerated or rewritten.</span></li>
</ol>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So what is the alternative... </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<h4 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Fakes</span></h4>
<div style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Refactor your code to be testable by using interfaces.</b></span></div>
<div style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b><br /></b></span></div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Break things down into simpler interfaces and create fakes that implement the minimum methods for testing purposes. Those methods should perform some of the business logic in a simulated manner for them to better test the code and relation between method calls than pure stubs would. Your model of the dependency is direct rather than based just on your calls of its API, so arguably easier to debug when that model and the dependency (and your evolving uses of it) diverge. </span><br />
<br />
<b><span style="font-family: "arial" , "helvetica" , sans-serif;">Use ready made Fakes</span></b><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">But back to K8s and Google APIs ... in some cases the Google component libraries already have fakes as part of the library. For example pubsub has pstest. So you can just add the methods required so that things work for your test. In which case faking can be simple ...</span><br />
<br />
<script src="https://gist.github.com/edcrewe/821000ba7d370e34602b22170fc422c3.js"></script>
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">The client-go library has almost 20 fakes covering most of its components but the only other fakes already in the K8s go libs (that I could find!) are for pubsub and helm</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">cloud.google.com/go/pubsub/pstest/fake.go</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">k8s.io/helm/pkg/helm/fake.go</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">k8s.io/client-go/tools/record/fake.go</span><br />
<span style="font-family: arial, helvetica, sans-serif;">k8s.io/client-go/discovery/fake</span><span style="font-family: "arial" , "helvetica" , sans-serif;"></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">k8s.io/client-go/kubernetes/typed/core/v1/fake</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">... etc</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">However there are also third party libs for fakes such as </span><br />
<a href="https://github.com/fsouza/fake-gcs-server"><span style="font-family: "arial" , "helvetica" , sans-serif;">https://github.com/fsouza/fake-gcs-server</span></a><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Use custom built Fakes</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">If there is not an existing fake or it doesn't fake what you need, then for Google libs the APIs you need to replicate are not simple and you may want to simulate a number of methods for your tests. So manually creating the fake and maintaining its API against the real Google API becomes too much work, compared with autogenerating mocks. </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br />Google have sensibly anticipated this and hence released </span><span style="font-family: "arial" , "helvetica" , sans-serif;"><a href="http://github.com/googleapis/google-cloud-go-testing">google-cloud-go-testing</a></span><br />
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">This package provides the full set of interfaces of the Google Cloud Client Libraries for Go</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"> there is no need to generate mock partial interfaces or maintain your own fake versions of its APIs.</span></div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">As an example it can be used to create a fake GCS service, where data is just written to memory (in the global bucketStore variable)</span><br />
<br />
<script src="https://gist.github.com/edcrewe/f68f7fbe86b20abf8ed55ff4cc9e4da9.js"></script>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The test substitutes the FakeClient for the real storage client. In order for the code to accept the real or fake client as the same type the library provides an AdaptClient method so both conform to the storage interface (stiface).</span><br />
<span style="background-color: white; font-family: "menlo"; font-size: 9pt;">c, err := </span><span style="color: #805900; font-family: "menlo"; font-size: 9pt;">storage</span><span style="background-color: white; font-family: "menlo"; font-size: 9pt;">.</span><span style="color: #36666b; font-family: "menlo"; font-size: 9pt;">NewClient</span><span style="background-color: white; font-family: "menlo"; font-size: 9pt;">(ctx, </span><span style="color: #805900; font-family: "menlo"; font-size: 9pt;">option</span><span style="background-color: white; font-family: "menlo"; font-size: 9pt;">.</span><span style="color: #36666b; font-family: "menlo"; font-size: 9pt;">WithCredentialsFile</span><span style="background-color: white; font-family: "menlo"; font-size: 9pt;">(apiCredsFilename))</span><br />
<pre style="background-color: white; font-family: Menlo; font-size: 9pt;">client = <span style="color: #805900;">stiface</span>.<span style="color: #36666b;">AdaptClient</span>(c)</pre>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<h2 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">K8s Go integration Testing</span></h2>
<span style="font-family: "arial" , "helvetica" , sans-serif;">For integration tests you ideally want to use the real dependencies, but if they are too slow or costly then they may well be best replaced by emulators.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Using gcloud emulators</b></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b><br /></b></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Google also provides a number of full emulators to cater for speedy local integration testing,</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><a href="https://cloud.google.com/sdk/gcloud/reference/beta/emulators">https://cloud.google.com/sdk/gcloud/reference/beta/emulators</a> which cover </span><span style="font-family: "arial" , "helvetica" , sans-serif;">bigtable, </span><span style="font-family: "arial" , "helvetica" , sans-serif;">datastore, </span><span style="font-family: "arial" , "helvetica" , sans-serif;">firestore and </span><span style="font-family: "arial" , "helvetica" , sans-serif;">pubsub.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">So as part of your integration test setup you can can fire up the datastore emulator for example </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"></span><br />
<pre style="background-color: white; font-family: Menlo; font-size: 9pt;"></pre>
<pre style="background-color: white; font-family: Menlo; font-size: 9pt;">> export DATASTORE_EMULATOR_HOST=localhost:17067</pre>
<pre style="background-color: white;"><span style="font-family: "menlo";"><span style="font-size: 12px;">> gcloud beta emulators datastore start --no-store-on-disk --consistency=1.0
--host-port localhost:17067 --project=my-project
</span></span></pre>
<pre style="background-color: white;"><span style="font-family: "menlo"; font-size: 9pt;">The datastore client can then just be connected to the emulator for testing</span></pre>
<pre style="background-color: white;"><pre style="font-family: Menlo; font-size: 9pt;"><span style="font-size: 9pt;">client, err := </span><span style="color: #805900; font-size: 9pt;">datastore</span><span style="font-size: 9pt;">.</span><span style="color: #36666b; font-size: 9pt;">NewClient</span><span style="font-size: 9pt;">(</span><span style="color: #805900; font-size: 9pt;">context</span><span style="font-size: 9pt;">.</span><span style="color: #36666b; font-size: 9pt;">Background</span><span style="font-size: 9pt;">(), </span><span style="color: green; font-size: 9pt; font-weight: bold;">"my-project"</span><span style="font-size: 9pt;">)</span></pre>
</pre>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><b>Using EnvTest and a local K8s API server</b></span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">The </span><span style="font-family: "arial" , "helvetica" , sans-serif;"><a href="https://godoc.org/sigs.k8s.io/controller-runtime/pkg/envtest">EnvTest</a> package creates a Kubernetes test environment that will start / stop the K8s control plane and install extension APIs. The K</span><span style="font-family: "arial" , "helvetica" , sans-serif;">8s API server (and its etcd store) is by default a local emulator service (although it can also be pointed to a real K8s deployment for testing if desired).</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">EnvTest is wrapped up as part of Kubebuilder which is the primary SDK for rapidly building and publishing K8s APIs in Go. </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">EnvTest caters for testing complex Kubernetes API calls of the type that might be required for testing a K8s operator for example. Hence when generating code for building an operator, </span><span style="background-color: white; color: #333333; font-family: "microsoft yahei" , "helvetica" , "meiryo ui" , "malgun gothic" , "segoe ui" , "trebuchet ms" , "monaco" , monospace , "tahoma" , "stxihei" , , "stheiti" , "helvetica neue" , "droid sans" , "wenquanyi micro hei" , "freesans" , "arimo" , "arial" , "simsun" , , "heiti" , , sans-serif; font-size: 15.96px;"> </span><a href="https://book.kubebuilder.io/reference/testing/envtest.html" style="font-family: arial, helvetica, sans-serif;">kubebuilder</a><span style="font-family: "arial" , "helvetica" , sans-serif;"> uses the controller-runtime in its boilerplate for running this up for a template integration test.</span><br />
<br />
<script src="https://gist.github.com/edcrewe/e8f87e606ab4a3d4fb6b9c46a8ca4a1f.js"></script>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<h3 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Summary</span></h3>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So if your K8s Go code is tested only by mocks for unit tests and running up a real Kubernetes cluster for integration tests, then maybe its time to re-evaluate your testing approach and start using the tools for fakes and emulators that are available. The only issue is that they are quite numerous with a mix of sources, so picking the right mix of Google internal lib, Google or 3rd party test package or custom built fakes and emulators becomes part of the task.</span></div>
<div>
<br /></div>
</div>
Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-4598318034148052782019-09-01T15:35:00.001+01:002019-09-02T10:44:01.206+01:00Teaching an old Pythonista new Gopher tricks<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: "arial" , "helvetica" , sans-serif;">I recently got a new job where I need to write a lot of Golang, so needed to learn it.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">I figured that you don't really learn a language unless you try and write code that actually does something useful. However having been to a recent Golang meetup where someone had come to a similar conclusion, and had written a full emulator of the <a href="https://github.com/Humpheh/goboy">Gameboy in Go</a> - I also figured I wanted to do something that was not quite so complex or low level ... ie hopefully, could be done in a week.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So I decided to take the plunge by creating an open source package that does the same job, as a Python one that I released many years ago called <a href="https://github.com/edcrewe/django-csvimport">django-csvimport</a>. A simple add-on for the Django ORM that caters for loading data to models from CSV files, with the option to generate the model code from scratch for a CSV file by checking the data fields and determining the data type for each column.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Also doing a task where I had solved the problems in another language would mean I could just focus on how Golang might approach the problem, not the problem itself. So this post is about the practical differences between writing a Python and Golang solution. As such it compares the languages as tools for a certain job, which I hope is complementary to the many posts that compare the languages themselves. Suffice is to say, they differ in many ways ... most significantly in static vs. dynamic typing ... whilst being most similar in regarding readable consistent simple syntax as paramount - where other languages have different priorities - hence for both auto-formatting code is good practise, with Go's builtin go format doing the job of Python's <a href="https://docs.google.com/presentation/d/1rpQlJTv9uBWicuu2cQURG1Yfu-j4cK-EDd9LwbD-SdA">black or yapf</a>.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So firstly Django is one of the leading full web frameworks for Python, so what is the equivalent for Go? Gorilla, Gin, Buffalo etc. there are plenty of frameworks but which is the leading one with an ORM? ... I tried out a couple but reading around it, it became apparent that if you choose to develop a web app in Go, then the majority of devs don't use a framework at all!, so already the differences in the languages was becoming apparent. Reasons? If you choose Go for creating a web app then performance may be a significant requirement, even micro frameworks can be slower than raw code. Go is a recent language and as such has lots of web related features built into the core already ... templating, etc. and even imports are url based so a web framework in Go gives you less than it does in Python.</span><br />
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So instead I checked out Go ORMs and decided to write an extension package for <a href="https://gorm.io/">Gorm</a> as one of the leading Go ORMs.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So ditching the Web Framework / UI integration features of django-csvimport as an unnecessary extra, then the problem just consists of two parts, creating ORM model definitions that create relational database tables and parsing the CSV files to import the data to those tables.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">From this high level spec. the core functional components that compose the tool that we want to rebuild in Golang are:</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<ol style="text-align: left;">
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">CLI interface to take arguments specifying source files and actions to perform</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">An ORM to manage vendor independent database schema creation and population</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">Utility to inspect data sources and determine data types</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">Template tool to create ORM models (metaprogramming)</span></li>
<li><span style="font-family: "arial" , "helvetica" , sans-serif;">CSV parser to read in CSV files - ideally capable of handling various formats and poor or inconsistent formatting - ie real CSV files!</span></li>
</ol>
</div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"> For all of these we would hope for language level packages are available to do the major lifting. Then the package can just knit them together into a CSV to relational database import utility.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So stepping through these and rating Go vs Python...</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<h4 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">
CLI framework (draw)</span></h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;">As a minimum, our task requires a command line utility to point to the CSV data files to be imported.</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Django comes with a CLI framework in the form of management commands. For our Go CSV import, <a href="https://github.com/edcrewe/gormcsv">gormcsv</a>, we just have the ORM so we could roll our own CLI handling, but in this case that is probably not a great idea, since like Python, Go has a dominant CLI framework - <a href="https://github.com/spf13/cobra">Cobra</a> equates to Python's <a href="https://click.palletsprojects.com/">Click</a>. It uses the <a href="https://github.com/spf13/viper">Viper</a> config framework which is like Python's core <a href="https://docs.python.org/3/library/configparser.html">configparser</a> lib with extras. Within the gormcsv module I created these CLI command go files as a cmd package via Cobra's autogenerate feature and used them to wrap the importcsv.go and inspectcsv.go source files in the importcsv package that do the real work.</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<h4 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">
ORM (draw)</span></h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Any language's leading ORM's should cope with the database management and data population tasks and GORM is functionally similar in its capabilities to the Django ORM</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<h4 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">
Data source introspection tool (Python win)</span></h4>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><a href="https://messytables.readthedocs.io/en/latest/">Messytables</a> is a mature package designed for the task of scraping in data from various heterogenous third party sources - possibly of poor quality. As such it is one of the many utilities created around Python's well established role in the data analytics realm. Go has no such tool. There is no third party package to cater for inspecting, type checking and cleaning up data sources :-(</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So we have to make our own much simpler data inspector that will hopefully cope Ok with the most common data types if they are reasonably consistently formatted.</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<h4 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">
Templating tool for creating models (Go win)</span></h4>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">For GORM and Django the ORM models are implemented directly as classes in the language rather than using an intermediate DSL or XML etc. So to create models based on introspecting source data metaprogramming must be used to generate code.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Templates are available in the core of Go. Also given it is statically typed and has no generics, then for some problems that generics would solve, the best alternative is to use metaprogramming. Hence templated generation of Go code is a normal Go pattern. So arguably this is better (core) supported in Go than Python. For Python code generation is rarely needed, and my original django-csvimport implementation just uses string construction and didn't even employ one of Python's many add on template packages, eg. Django or Jinja2 templates (hmm needs a rewrite!)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Note that both languages have fully functional reflection / introspection libraries in the core.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<h4 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">
CSV Parser (Python win)</span></h4>
<div style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">Most important to this application is the quality of the CSV parser. This is where Go is sadly completely let down. Its CSV parser is frankly inadequate and can only cope with CSV that is strictly formatted according to <span style="background-color: white; color: #3e4042;">RFC 4180.</span></span></div>
<div style="text-align: left;">
<span style="background-color: white; color: #3e4042;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></span></div>
<div style="text-align: left;">
<span style="background-color: white; color: #3e4042;"><span style="font-family: "arial" , "helvetica" , sans-serif;">To quote from Python's <a href="https://docs.python.org/3/library/csv.html">csv parser library</a> ...</span></span></div>
<div style="text-align: left;">
<span style="background-color: white; color: #222222; font-family: "arial" , "helvetica" , sans-serif; text-align: justify;"><br /></span></div>
<div style="text-align: left;">
<i><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; text-align: justify;">CSV format was used for many years prior to attempts to describe the format in a standardized way in </span><span class="target" id="index-1" style="background-color: white; color: #222222; text-align: justify;"></span><a class="rfc reference external" href="https://tools.ietf.org/html/rfc4180.html" style="background-color: white; color: #6363bb; text-align: justify;"><strong>RFC 4180</strong></a><span style="background-color: white; color: #222222; text-align: justify;">. The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources.</span></span></i></div>
<div style="text-align: left;">
<i><span style="background-color: white; color: #222222; font-family: "arial" , "helvetica" , sans-serif; text-align: justify;"><br /></span></i></div>
<div style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; text-align: justify;">TBH</span><span style="background-color: white; color: #222222; text-align: justify;"> Python 3's CSV parser is itself significantly more strict about format than the old Python 2 one and so certain CSV files cannot be parsed that Python 2 happily dealt with - largely due to the switch to unicode resulting in more character encoding related critical fails. However the Go parser is a whole other level of strict and realistically it can probably handle less than 10% of the real world CSV source files out there that you might want to scrape data from, into a database. Whilst Python 3's can probably cope with over 80%</span></span></div>
<div style="text-align: left;">
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white; color: #222222; font-family: "arial" , "helvetica" , sans-serif; text-align: justify;"><br /></span></span></div>
<div style="text-align: left;">
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white; color: #222222; font-family: "arial" , "helvetica" , sans-serif; text-align: justify;">I also investigated third party Go librarys that cater for parsing a more realistic range of CSV formatting, but found none that did so.</span></span><br />
<span style="font-family: "times" , "times new roman" , serif;"><span style="background-color: white; color: #222222; font-family: "arial" , "helvetica" , sans-serif; text-align: justify;"><br /></span></span></div>
<h4 style="text-align: left;">
<span style="font-family: "arial" , "helvetica" , sans-serif;">
Conclusion</span></h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So in conclusion, Python may not be a Gopher Snake but for this task it does rather eat Go for breakfast. There is no ready made third party package to deal with ingesting unknown or badly formatted data like Python's aptly named <a href="https://github.com/okfn/messytables">messytables</a>. Golang may sometimes be used for writing performant concurrent data processing in data science ... but it isn't used for the scraping and cleaning data sources part of the job! However this is a minor issue compared to the major blocker of not having an existing library that can import real world (ie sloppy format) CSV files.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">So I have written my Go package for pushing CSV files to databases, <a href="https://github.com/edcrewe/gormcsv/issues/1">gormcsv</a>, and due to Go's great concurrency features it could certainly beat django-csvimport hands down in speed terms where big data quantities of CSV sources need ingesting. But I have yet to release it. Because with such poor compatibility with real CSV files, there doesn't seem to be much point - however I will hopefully persist in finishing things off, probably as a less performant work around to pre-clean CSV files into strict RFC 4180 prior to parsing. Since implementing my own CSV parser from scratch for Go would likely break my original goal's of coming up with an open source project in the language that would take no longer than a week!</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Oh and what do I think of Go? Well I like it, I most like the concept of classes just being data structs with bags of composed methods loosely coupled to them. I least like the error handling unseparated from normal code flow ... since it can lead to poor readability of code due to the excessive error boilerplate stuck within the program flow. It is my new favourite (statically typed) language ... but it hasn't replaced Python as my overall favourite.</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
</div>
</div>
Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-83196653445150397862015-08-02T17:45:00.003+01:002015-08-02T17:45:27.791+01:00Using Java Spring & MyBatis for dynamic schema integrationUsing Spring MyBatis for dynamic schema integration may perhaps be subtitled, "Eating soup with a fork" ... since that is how it has felt at times. However that may partially be due to my lack of familiarity with these tools. So this blog posting is about why I got that feeling, how I bent my fork to the purpose ... and if I missed some tricks in doing that. Please let me know by commenting below :-)<br />
<br />
<h4>
Object-relational mapping - a personal history</h4>
I have been developing relational database applications for many years so I have been through the various stages of database persistence approaches. Starting with fairly raw SQL development or at most just fixed DAO wrapping. Through SQL templating languages and mapping approaches to largely adopting the full object-relational mapping, ORM, approach for the past few years. I even tended to use the same pseudo ORM top level for NoSQL sources, although of course a lot of ORM features then become inapplicable, such as use of an ORM model's relational navigation (eg. person.department.head.surname)<br />
<br />
Having said that the <a href="https://docs.google.com/presentation/d/1lnuV1LJgYuraWTratuJgiVVj1-SqQ8HOXmFVUPpNcz8/edit#slide=id.p">object-relational impedance mismatch</a> can still bite. However good an ORM is there is some need for understanding of what its API simplifications and abstractions are doing. To avoid excessive / badly performant SQL. But given that caveat, the amount of time saved in development and maintenance by a well implemented full ORM inevitably saves money on all but the simplest persistence requirements. Similarly the automation involved in schema migration, constraint, indexing and other data modelling needs is invaluable. As long as one knows what it is doing under the hood and hence when that 10-20% of database development / customised data design is required. For example custom functions and triggers, data writable views that present as a table for the ORM, etc.<br />
<br />
<h3>
MyBatis - is it an ORM I see before me?</h3>
Recently I have started working with a team who use an older Java technology set of Spring and MyBatis. My current project uses the Java MyBatis SQL framework to integrate various relational database sources.<br />
MyBatis badges itself as a persistence framework. So it is not really claiming to be an ORM. It started as an SQL templating engine and that is still where its core skills lie. But it now has many other features bolted on. Including meta programming to generate Java relational mappers from database sources.<br />
<br />
So it does have many ORMish features. However using MyBatis has been rather a painful journey for me, since I have found that it fails to implement many ORM design pattern principles that my ORM habituation had come to expect. Also using MyBatis with Spring MVC means we havent got proper MVC since the model is not a model as it would be with <a href="http://hibernate.org/orm/">Hibernate</a> ... its a MyBatis Mapper.<br />
This is not just a minor niggle, on first use a core ORM feature is gone. A Mapper is not an object model of a table's row of data, with transaction and session management under the hood. Instead it is just a convenient object to hook up standard data queries and updates to with the management of saves and synchronisation of actions on Mappers and results returned by them, entirely manual. So in that sense you have less than what some more basic DAO wrappers give you, ie. when you update a record then fetch the values from that record they will return the data written into and queried back from the database, hence an inserted saved model would automatically hold the new primary key(s).<br />
With Mappers the data readable from the Mapper will just be the data set to be updated not the actual data held in the database as a result of that update. To find out the result of your query you need to ensure you get a fresh query result, by requerying or flushing session.<br />
<br />
There is a work around for this common requirement, you can annotate an insert with a custom select query to tell MyBatis to re-query just for the key manually before or after doing the update. However I found this didn't work well for Oracle and was a problem anyway with what I required. Largely because I was using Generator based classes, so adding custom annotations to these would not have worked for me.<br />
<br />
<pre class="CICodeFormatter">@SelectKey(statement="SELECT MY_SEQUENCE.NEXTVAL() FROM DUAL", keyProperty="nameId", before=true, resultType=int.class)
int insertMyTable(Name name);
</pre>
<br />
Similarly there is no lazy update concept, updates are only and always done when you specifically run them. In effect its like using an ORM but disabling all the relational modelling, data synchronisation machinery and transactional management. Mappers are instead just convenience wrappers for running SQL - <a href="http://mybatis.github.io/generator/">MyBatis Generator</a> may do the ORM process of creating Java classes that (via XML) map to database tables or views. But mapper instances are not ORM synchronised table row objects.<br />
<br />
The normal object centric approach is also not available. Whilst you can generate code from your database, you cannot generate your database from code. So full persistence data cycle management is not available out of the box. By that I mean the use of test harness and deployment tools that destroy and recreate the full database, populate it from fixtures, run tests and drop it again. A migration tool that introspects databases and can migrate back and fortbr /br /br /h between schema versions to sync them with code versions or vice versa, ie code -> data, schema management, in addition to doing the MyBatis Generator data -> code direction. So the sort of thing that a full stack web framework like <a href="https://docs.djangoproject.com/en/1.8/topics/db/">Django</a> has with its ORM.<br />
<br />
This was a blow for my particular project since full data cycle management is exactly what I required for developing a central data aggregation and integration database. I also required proper fixture management for testing and the same serialized data input / output components of this being used for the main work of data aggregation from data sources in serialized format (XML, CSV etc.)<br />
Finally the system was dealing with pulling in and delivering back out complex, evolving schemas from source to consumer databases. The one thing that would be essential is that all the schema handling and data typing be dynamic so that the code can handle this change rate without constantly breaking and having to be rewritten. So hard coded schema details and static typing must be avoided for any chance of maintainability. But it was to be written in a statically typed language. Again the tool was not suited to the job. So how did I bend this seemingly unsuitable platform to the task in hand.<br />
<br />
<h3>
Bending the fork</h3>
The first step was to tackle the dynamic schema vs. static typing issue. Java reflection is your friend here. Then reinvent some of the missing ORM wheels. Fixture loading, schema generation.<br />
Along with working around the lack of object data synchronisation and automatic session management. Finally we wanted to make our data population speedy by only updating data that needed updates. So a hashing mechanism that could work with the serialized data was required.<br />
Note that our in house Java standards decree that all configuration should be done via annotations rather than XML where available ... so that is pretty much everything in Spring ... aside from the core MyBatis Mappers XML files. The aggregation database was Oracle and various database's including MS SQL Server were the data sources.<br />
<br />
<h4>
Dynamic schema handling</h4>
Use MyBatis Generator, adding a config file for each database source and the target aggregation database.<br />
Add the name of each table or view required for them to these config files ... eg.<br />
<pre class="CICodeFormatter"><table tableName="PERSON" modelType="flat"></table></pre>
<br />
ideally that would be all the schema customisation required as long as data source and target naming of tables and columns matched. However in a handful of cases mapping data (eg. tableA.colB = tableC.colD) was needed in the properties files.<br />
<br />
I ran this as part of the Maven build at first, but it proved more manageable to be able to run code generation for the sources and target databases at different times, so I switched to wrapping this up in a command class for triggering independently via a batch job scheduler. In the same way as the data population commands were to be run. Schema -> code generation tended to be much faster than either data population or code -> schema generation, so freqprepreuent execution was not an issue.<br />
<br />
The core class I wrote for enabling dynamic schema handling was called TableMeta and each table has an instance of this class listing all the metadata about its columns and primary keys etc.<br />
<br />
This class was injected into a ModelFactoryService whose job is to return results or add, edit and delete models from any Mapper.<br />
The Factory Service has a reflection based method for invoking the MyBatis generated Mapper methods, e.g. "SelectByExample" ...<br />
<br />
<pre class="CICodeFormatter"> protected Object invokeMapperMethod(String methodName, Object parameter) {
Class klass = exampleClass;
if (parameter != null) {
klass = parameter.getClass();
} else {
parameter = newExample();
}
try {
Method method = mapperClass.getDeclaredMethod(methodName, klass);
try {</pre>
<pre class="CICodeFormatter"> // Run the method
return method.invoke(mapper, parameter);
} catch (IllegalAccessException | IllegalArgumentException exRun) {
Logger.getLogger("ModelFactoryService").log(
Level.SEVERE, modelClass.getSimpleName() + "." + methodName, exRun);
}
} catch (NoSuchMethodException | SecurityException exCall) {
Logger.getLogger("ModelFactoryService").log(
Level.SEVERE, "Mapper class has no " + methodName + " method", exCall);
}
return 0;
}
</pre>
<br />
<br />
... similarly there were reflection based set and get column methods surfaced by whole Mapper data modification methods such as doUpdate(Object model)
<br />
The TableMeta caters for initial setup of the classes generated for that particular table by MyBatis Generator. Straight use of Class.forName(className) works with manipulation of the table name string copying that of the Generator's standard naming convention. The table's Mapper, Model and Example classes are then added as the template classes for the Factory. From that a Setters and Getters hash of the table's Mapper methods for each column can be built automatically.<br />
<br />
<h4>
Fixture loading</h4>
A sax parser based loader class reads in data fixtures in XML format. Each fixture row is parsed to a hash of tag name to value which can be passed via a SaverService to the ModelFactory.<br />
An ObjectConvertor static class caters for de-serialising fixture data to the correct Java types looked up via the column name from TableMeta, and the Setter hash can then be used to update the matching named columns in the model.<br />
<br />
<h4>
Fixture dumping</h4>
For fixture dumping I cheated a little. Since I was only ever dumping from the target Oracle database then rather than reinventing a <a href="https://code.djangoproject.com/wiki/Fixtures">full serialisation</a> of models (e.g. ORM with built in serialisation from any database to xml, json, yaml, csv etc.) I just provided a tool to do the minimum required for my testing and development needs - to just be able to serialise any table or view to XML from Oracle.<br />
So in this case, to avoid the maintenance madness of a separate statically typed fixed schema solution for each table - I did the XML part in Oracle.<br />
That way I could use a GenericMapper which generated two string fields based on a dynamic SQL query built from the TableMeta. The first being the row XML and the other the concatentated list of the serialised columns to be contained within it. MyBatis <complete id="goog_1753319351">@</complete>SelectProvider annotation allows the gluing on of a method taken from another class to generate SQL ...<br />
<br />
<pre class="CICodeFormatter"> @SelectProvider(type = TableDump.class, method = "getXMLProvider")</pre>
<pre class="CICodeFormatter"> List<GenericModel> getXML(@Param("meta") TableMeta meta, @Param("rows") final int rows);
</pre>
<br />
The method glued on is this one which uses Oracle's native XML methods and the TableMeta's list of columns to generate a query that directly returns the XML..
<br />
<br />
<pre class="CICodeFormatter"> /**
* Dynamic SQL generator method generates XML output for fixture as row and cols
*/
public String getXMLProvider(Map<string object=""> params) {
final TableMeta meta = (TableMeta) params.get("meta");
final int rows = (int) params.get("rows");
String sql = new SQL() {
{
String rowXML = " '' || XMLElement(\"row\", XMLAttributes(";
for (String col : meta.getPKeys()) {
rowXML += col + " AS \"" + col.toLowerCase() + "\", ";
}
rowXML = rowXML.substring(0, rowXML.length() - 2) + ")) \"ROWKEYS\"";
SELECT(rowXML);
String queryXML = "''";
col_loop:
for (String col : meta.getCols()) {
queryXML += " || XMLElement(\"" + col.toLowerCase() + "\", " + col + ")";
}
SELECT(queryXML + " \"ROWCOLS\"");
FROM(meta.getTableName());
for (String col : meta.getPKeys()) {
ORDER_BY(col);
}
if (rows > 0) {
WHERE("rownum <= " + rows);
}
}
}.toString();
return sql;
}
</string></pre>
<br />
This above snippet shows an example of MyBatis SQL templating - the core of MyBatis.<br />
The result is a single Generic Mapper that by calling with the appropriate tables TableMeta can return an XML dump of any table in the target database. For test fixture generation this can be passed a result length ... since usually ten or twenty will be sufficient for integration testing.<br />
<br />
<h4>
Save methods</h4>
<div>
I won't go into the save methods in detail - suffice to say the data was hashed on the way into Oracle and then a query of primary keys to last modified hashed data returned of all rows to allow incremental update by hashing each of the XML data sources rows and checking it first to determine if an update or insert was required. Whilst this could not take advantage of databases or Java's hashing - since it had to be serialised data compatible - it did have a significant impact. Since the rate of incremental updates meant that we are getting a maximum of 1% data churn per table, so even with all the hashing and comparision overhead - an incremental update is still around 30 times faster.</div>
<div>
<br />
For versioned data a core data table and a versioning table was required and this then needed both to get access to the new primary keys. As mentioned this is not straight forward in MyBatis - so the easiest solution was to have a bespoke version Mapper that just wrapped a single versioning sequence and could be used to update the related data and version tables by calling custom VersionMappers nextVal or currVal methods.</div>
<div>
<br />
<h4>
Code Schema Cycle</h4>
As mentioned MyBatis doesnt cover the code to schema half of the persistence cycle, so the option was to employ a separate full schema life cycle framework such as <a href="http://www.liquibase.org/">Liquibase</a>. Or roll my own. In this case my requirements were not for extensive migration features. Since as an aggregation database the schema could be snapshotted dropped and rebuilt. So to avoid complexity and further dependencies I just added a tool to build Oracle schema from standard dumps of the schema from an Oracle client tool. So you just dump each object to a separate DDL SQL file, then it checks all the files works out which object type they relate to and loads them in a sequence which should prevent referential integrity clashes, ie.<br />
<br />
DATABASE LINK, SEQUENCE, TABLE, MATERIALISED VIEW, FUNCTION, VIEW, INDEX, PROCEDURE, TRIGGER, CONSTRAINT, FOREIGN KEY<br />
<br />
This is wrapped up as a DatabaseReset command that either drops all data or the full database and rebuilds the schema. So a run of reset followed by the GenerateMappers command gives a freshly built persistence layer and Model code to talk to it. It would be nice to have the schema built from the Mappers code and a more standard coherent code schema cycle, but given Mappers are in use, that would not be available even if Liquibase were in the mix.<br />
<br /></div>
<div>
<h3>
Spring issues</h3>
So as a newbie to MyBatis I guess I had some problems with it. But as long as this slightly irascible post is. Guess what, I also had issues with Spring too ... so I will make it even longer by getting those off my chest too :-)<br />
<br />
It seems to require huge amounts of configuration to do what you want. Again it can be made to do most things, but ... and maybe to some extent ... because of that ... it takes an awful lot of configuration to do some of the things a more opinionated full stack web framework would do out of the box. I guess I need to come across the sort of edge cases which are really difficult in a full stack framework, to appreciate Spring, since currently it feels like it needs a great deal of maintenance heavy tinkering to do some of the basics.<br />
<br />
So the first surprise was that all my Spring @Service classes default to singletons. So in order to have for example a ModelFactory for the two different tables a versioned update required I needed to build a BeanFactory and make these beans and annotate them as prototypes - ie normal classes not singletons. I guess this is because all of this related to command classes not Spring MVC web classes ... which would have had web session scope.<br />
<br />
<pre class="CICodeFormatter"> @Bean
@Scope(value = ConfigurableBeanFactory.SCOPE_PROTOTYPE)
public ModelFactoryService modelService() {
return new ModelFactoryService(prop, hashService());
}
</pre>
<br />
So all my indvidually injected service classes tended to have to be uninjected and instead the bean factory injected in their place.<br />
<br />
Along with that the database session to mapper class connection seemed rather frail.<br />
The only way to ensure this worked was never to use a generic session handling method but always use separate prototype beans for separately named session classes annotated to find the correct mappers for that particular database ...<br />
<br />
<pre class="CICodeFormatter">@Configuration
@MapperScan(basePackages = "uk.ac.bris.adt.erp.dataint.sources.mm", sqlSessionFactoryRef = "MMSessionFactory")
public class MMDBConfig extends DBConfigABC { ... }
</pre>
<br />
... not only that the directory hierarchy matters due to the niceties of <a href="https://mybatis.github.io/spring/">MyBatis-Spring</a> , since the MapperScan will find anything at or below a directory ... so you cannot put mappers for one connection below the point you need to scan for another ... or they will be sucked up as Mappers for the wrong session.<br />
<br />
<h4>
Test configuration</h4>
Also the test configuration for integration tests required a lot of manual annotations to pick up appropriate configuration environment in order to inject things into the context in a working manner to match the running code configuration. Again I had been spoilt by expecting a framework specific automatically working test harness with default test customisations of the runtime code environment thrown in. Instead each integration test needed to use a wrapper that called a RunnerBeanConfig<br />
<br />
<pre class="CICodeFormatter">@PropertySource("classpath:test.properties")
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(loader = AnnotationConfigContextLoader.class, classes = ITRunnerBeanConfig.class)
public class ModelFactoryServiceIT extends ServiceABC { ... }
</pre>
<br />
The RunnerBeanConfig then needs to have all the database and resources configs annotated as Imports or it cannot find the different database sessions. Only after all that is it possible to inject the runtime command classes for testing.<br />
<br />
<pre class="CICodeFormatter">@ComponentScan("uk.ac.bris.adt.erp.dataint")
@PropertySource("classpath:test.properties")
@Configuration
@Import({ DataIntDBConfig.class, MMDBConfig.class ... ResourceConfig.class })
public class ITRunnerBeanConfig implements EnvironmentAware {
private Environment env;
@Override
public void setEnvironment(Environment environment) {
this.env = environment;
}
...
}</pre>
<br />
<h3>
Conclusion</h3>
OK so I got it working in the end. However it all felt rather like I was doing something rather bespoke and complicated. This added a great deal of development time (or more accurately configuration wrangling time) to deliver components of persistence layer management and testing - that I had previously expected would all be available already in a mature high level framework.
So in summary I probably have to conclude that if you needed to do this task in a more maintainable and standardised manner with a minimum of custom code. Don't use Spring and MyBatis. However if you already use one of those tools ... and you need to add this sort of functionality with them. Then it is certainly doable, and perhaps if you are a newbie to them, like me, this post may be of some use in speeding the job up for you :-}
</div>
Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-89886786957955593482015-07-30T20:17:00.005+01:002015-08-10T15:35:13.650+01:00Picking a language for shell scripting based on its framework ecosystem<br />
<ul>
<li>Do you work somewhere that has a lot of old shell scripts? </li>
<li>Were they all dashed off quickly according to the whims of their original creator - just deploy scripts so maybe they missed some of the care (and documentation) the application code had?</li>
<li>Whenever a major revision of said shell scripts is needed, do all of them end up in the bin - sorry trash in case you thought I meant the /usr/bin. Because its easier to write such platform and author specific scripts from scratch than maintain them?</li>
</ul>
<br />
Well the team I recently joined does. So as somebody who personally moved from shell scripts to a shell compatible object orientated scripting language some time ago. Then on to a shell framework, I was asked to assess what to move our menagerie of bash, zshell, cshell, powershell etc. scripts to.<br />
<br />
My remit was to recommend a language instead of all those developer specific procedural shell scripts. After all the point of a mainstream language is to introduce cross platform code with a huge set of useful libraries, along with hopefully some standard approaches. Hence get above the sea of shells.<br />
<br />
However my preference is to move the team to using a shell framework.<br />
<br />
Because what a framework does is give you opinions. It chooses how to do stuff, so you ... and all the developers that come after you, don't have to. In my opinion most of the productivity comes from everyone becoming familiar with the same way of doing things ... whether or not its the best way<br />
(and if it isn't the best way the framework developers spend a lot of time fixing that way with hopefully the minimum of disruption to its API for the end users).<br />
<br />
I felt it would be useful to see how active the shell framework ecosystem of a language was, in order to assess how suited to shell scripting it is. Since the goal of a shell framework is essentially the same goal, standardisation of shell scripts. So one indicator for the most suited language is to list the mainstream shell frameworks for the contenders and see which comes out top.<br />
<br />
It was decided we will choose from Javascript, Python or Perl. So this is for a team that almost exclusively use Java, which due to its pre-compiled statically typed nature is inherently poorly suited to shell scripting.<br />
Unfortunately Ruby was not to be a candidate. However I have added it anyway, because if its good enough for Amazon, and leading configuration management frameworks such as Puppet ... it should be included.<br />
<br />
Strangely it is difficult to find definitive lists of shell frameworks ... perhaps whilst I thought the term had been around for 10 years or more, it has not been coined very effectively ... its not even on Wikipedia ... maybe I made it up! Anyway they exist whether or not there is a proper name for them ... they also often cross over the grey area into being full blown configuration management tools.<br />
So to avoid that argument ... I will add those in as well. I have marked entries with a * where I think it is more a config management tool than a shell framework.<br />
<br />
For configuration management systems <a href="https://en.wikipedia.org/wiki/Comparison_of_open-source_configuration_management_software">wikipedia does have an entry</a> ... so using it as a similar benchmark it gives a relative language score of:<br />
<br />
Python 11, C 5, Perl 5, Java 4, Ruby 3,<br />
and 1 each for C++, Scala, Erlang, Go and PHP (none for Javascript)<br />
<br />
So on the same sort of basis I have ranked the languages with the most healthy shell framework ecosystem ... in the belief that it is a good indication that they may be the most suited to being used for shell scripting ...<br />
<br />
<h4>
Python 8</h4>
Python has the largest number of well established shell frameworks, as it does in terms of configuration management systems. So on that basis I could award it first prize as the most used language for standardising shell scripting. However numbers are not everything. It doesn't necessarily have the leaders of those two categories of software (see Ruby below)<br />
<ol>
<li><a href="http://www.fabfile.org/">fabric</a></li>
<li><a href="http://docs.openstack.org/developer/cliff">cliff</a></li>
<li><a href="http://python-deploy-framework.readthedocs.org/">python-deploy-framework</a></li>
<li><a href="http://cea-hpc.github.io/clustershell">clustershell</a></li>
<li><a href="http://builtoncement.com/">builtoncement</a></li>
<li><a href="http://docs.ansible.com/">ansible</a> *</li>
<li><a href="http://saltstack.com/">salt</a> *</li>
<li><a href="http://bcfg2.org/">bcfg</a> *</li>
</ol>
<h4>
Ruby 6</h4>
<div>
Ruby has capistrano which is possibly the most popular of all shell frameworks, it also has two of the most popular configuration management systems, Puppet and Chef, so on that basis it should perhaps be first or at least joint first with Python. As the most appropriate language for modern shell scripting.</div>
<div>
<ol>
<li><a href="http://capistranorb.com/">capistrano</a></li>
<li><a href="https://github.com/commander-rb/commander">commander</a></li>
<li><a href="https://github.com/mina-deploy/mina">mina</a></li>
<li><a href="http://rubyhitsquad.com/Vlad_the_Deployer.html">vlad the deployer</a></li>
<li><a href="https://puppetlabs.com/">puppet</a> *</li>
<li><a href="https://www.chef.io/">chef</a> *</li>
</ol>
</div>
<h4>
Perl 3</h4>
<div>
Perl has things that perhaps the other languages don't in related areas. So a number of <a href="https://metacpan.org/release/PerlPowerTools">shell implementation</a>s in Perl, and frameworks for writing custom shells. But in terms of what I mean by a shell framework - ie. a tool to run shell commands in a standard way across different servers, shells and platforms it is surprisingly lacking.</div>
<div>
<ol>
<li><a href="https://metacpan.org/pod/Shell::Tools">Shell::Tools and Shell::Tools::Extra</a></li>
<li><a href="https://www.rexify.org/">Rex</a> *</li>
<li><a href="http://www.quattor.org/">quattor</a> *</li>
</ol>
</div>
<h4>
Javascript 2</h4>
Ummm well Javascript may have broken free of the browser with node.js and other server side runtime environments. But are there any shell frameworks or configuration management systems built with it? ie. tools designed for running shell commands and doing anything other than Javascript installation. Not much TBH and certainly no configuration management engine.<br />
<ol>
<li><a href="https://github.com/tj/commander.js">commander.js</a> (clone of Ruby commander)</li>
<li><a href="https://www.npmjs.com/package/shelljs">shelljs</a> is the leading bunch of utilities for node.js that can act as a shell framework</li>
</ol>
<br />
<br />
<h3>
Conclusion</h3>
Its a tie between Python and Ruby, so the decision as to what to go for depends on the individual tools that best suit your shell framework requirements, what they may need to integrate with wrt. configuration management and deployment tools ... plus the existing skills of your developers.<br />
Oh and whether diversity is more important than being the market leader ... or vice versa.<br />
<br />
For my own case, Ruby isn't in the running, so that makes it Python based on this premise.<br />
I may follow up this blog post with one comparing the languages wrt. a sample shell script - and see which looks the most maintainable. As a second means of deciding between them.<br />
<br />
<b>Note</b>: One other factor is related virtual frameworks ( ... yep made that term up too!)<br />
Basically these wrap up configuration and deployment of virtualised software deployments - either hypervisors or containers - to give a full platform application build - usually for development purposes. <br />
So here Ruby is well ahead with <a href="https://www.vagrantup.com/">Vagrant</a> along with <a href="http://boxgrinder.org/">box grinder </a><br />
<h3>
</h3>
Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-5407500677931814702014-11-16T13:34:00.003+00:002014-11-16T13:42:16.593+00:00The 10 commandments of maintainable web servicesHere is a list of the ten core elements needed for a development to deployment phase infrastructure to provide a stable service for your web applications. Along with minimizing time wasted on bugs and issues that are unrelated to functional development, and slashing the maintenance time and cost - compared to systems without them. I guess it could also be called automation, automation, automation ...<br />
<br />
It should be noted that just because an application is a legacy one. It does not mean that all of this infrastructure cannot be retro fitted to it. *<br />
<ol>
<li><b>Standard environment</b><br />A set of consistently built and upgraded deployment phase environments - dev, demo, train, prod for the full application stack - e.g. app server, cache, web server and storage. All development and deployment is done on these entirely standard (ideally config management / virtualised) cloned environments. If random desktop / laptop computers must be used then ideally a virtual box build version should be provided for dev, to match the deployment ones.<br />For web applications the server side will be a single environment, but if client side software is involved it may require multiple standard environments for build and test.</li>
<li><b>Automated build</b><br />Run one command or press one button to create a full application stack instance on any of the deployment environments. Including production. So this should be everything above the standard environment and ideally include storage too (see data automation). Each developer can build numbers of deployment instances in the same automated fashion. Builds should be remotely runnable for plugging into Continuous Integration, C.I., servers etc.</li>
<li><b>Automated release management</b><br />Particularly important is that no manual tasks are needed to deploy to production. A push button C.I. driven deploy should be used where each deployment is retained in a full log accompanied by summary deployment note and related software packages release history and source tag. This full logging of changes ties into software service change management concepts. If unforeseen dependency system issues develop a lot later, they can then potentially be tied to the highly detailed timestamped change logging that this provides.<br />Automating the roll-out means that you should also automate the roll back. You hopefully will test well enough not to need a safety net, but not bothering to use one is reckless.<br />Another common loop hole is that release only covers the application layer. The standard environment, storage etc. are all part of the stack and changes in them are also releases, and need the same release management controls in place.</li>
<li><b>Revision control of the entire application stack</b><br />Everything in the application should be versioned. So all the source code of course. But also all the deployment and automation code. The third party components should all have their own versions (if not download, version and deploy from your own local repository). That includes the application specific environment configuration, eg. Apache virtual host configuration.<br />Build automation should allow specification of tags (or to a date or to any previous release - logged via C.I.)<br />The code dependency stack should also be versioned - so the versions of every component for a system release. Language specific build tools such as Maven, Pip, Ant, Phing, Bundler, Buildout etc. provide this. The standard environment(s) should also be versioned via their config management tool. </li>
<li><b>Integrated documentation</b><br />Core documentation should be written and versioned with the source code, each package should at least have README and a release HISTORY tied to each production releases version number. These need to be kept up to date with the rest of the source. Separate wikis for fuller / less technical docs are fine - but documentation of changes in functional specification need to use the same version control as the code - unless all your code has rigorous processes around a versioning integrated issue tracker - that is most reliably done by putting documentation in the code.<br />Ideally the language's packaging tools should have a system to extract embedded documentation and comments into HTML on a software repository server - for easy reference.<br />Automation to keep the web documentation up to date should be implemented. </li>
<li><b>Software upgrade process</b><br />Major version platform upgrades should always be performed within a year of release date. Not just for security patch reasons (these must be carried out within a month at most). Ideally the former within a few months and the latter within a few days. Any longer and code divergence can make the upgrade hill too big a cost to scale, or compromise systems data. Major language / framework (as well as releases) - should not require significant system outages. These may not be automated to set up, but they should be automated to roll over between upgrades - so if you are without a multi-server load balanced layer in part of your application stack - then downtime should still be under a minute at most, e.g. an Apache or Database restart.</li>
<li><b>Automated testing</b><br />They may not provide great coverage but a minimal test suite is a necessity to allow confirmation of success for the automation infrastructure.<br />Good test coverage means that complex functional errors or regressions can be written as tests and added to periodic builds - so ensuring that future releases are free of them - but a set of minimal functional or black box tests are sufficient to cover basic confirmation that automated environment upgrades, or minor application fix releases do not caused critical failures. These tests can also be tied to monitoring / timed load testing - to check upcoming releases for performance regressions.</li>
<li><b>Data automation</b>This involves data fixtures, automated schema generation and synchronisation.<br />With the advent of an object relational mapper (ORM) as standard in today's web applications. Then your system should have a full data abstraction layer, in even the most micro web application framework. In turn that means today's application code should contain within it the means to generate all of the data layer. Ideally ORM's should provide the means to abstract fully the database implementation, to generate that implementation within a range of RDBMS and to generate data fixtures for it, for building populated new development instances or for testing.<br />As standard the test harness will setup and teardown the data layers.<br />More mature ORMs will also have schema migration tools. These are essential for full automated release management, since invariably a significant release will involve a change to the data schema, or at least a new entry in the database. A synchronisation tool will tend to use meta-programming to automatically generate the migration code that synchronises the schema - that migration is then released (or rolled back) as part of the code release - keeping the data storage in the release management loop. Any data modification (DML) - that the application requires can be added to the DDL of the schema migration. These tools will also have introspection code to detect that data migration is required if connected to a previous version of the database. Bespoke applications may not have the tool, but at worst they should have data creation and migration code written and packaged with newly released versions - manual database tinkering around the same time as the code release, is not acceptable.</li>
<li><b>Package management</b><br />Application layer package management will always be language specific, but any language should offer it. Ideally a package repository should be maintained for each language your services use. These may be core to the language like PyPi and RubyGems or for languages without them in the core there are commercial offerings like Nexus for Java.<br />This caters for version dependency management and reliable upgrade. Of course to use a package manager fully, you should package all your application source code. Ad hoc scripts or framework app archives, raw class and resource bundles etc. - Just say no. If you are going to release your code rather than chuck it over the wall ... package it and version it. So all your code should be in jars, eggs, gems - or whatever your language likes to call them.<br />Not only that you should apply the same rules to splitting up packages as you apply to splitting up code into classes. Some packages may be dependent on others - but each separate component of the application should be a different package - to allow it to be separately version controlled and released. To encourage encapsulation and hence allow for packages to be reused, retired or replaced without replacing the whole application's code base.<br />(NB: Environment package management will be operating system specific and that should be implemented as part of the standard environment config management layer - no building from source here!)</li>
<li><b>Monitoring</b><br />One of the most important issues with logging and error notification is the cry wolf factor. You need to ensure that you draw the line in the right place for what are critical errors - ie. those that generate notifications to people. You can have over reporting initially if it makes you hammer down on all those bugs to get a reasonable level. But the one thing that makes monitoring ineffective, is over reporting, if a system is emailing you a hundred stack traces a day, and have been for the last month - or the critical log is equally verbose - you filter the emails and ignore the log. You need critical bug notifications to be rare enough that you jump straight on fixing them when they are sent. Of course don't over do it, ideally you shouldn't ever be in the position when the only reason you know that a service is down is because an end user has phoned up to tell you. If your monitoring is good enough it will always do that, for all but the most involved functional errors.<br />You also need standard uptime monitoring such as Nagios or the like to notify if services have failed completely (unable to send application layer errors) for each of the layers - web, storage, cache, environment.<br />Plus load logging for each, response time logging, etc. Most importantly you need to retain the logging over time and hence be able to look back at problems vs. change management data (see automated release management) to be able to diagnose many service issues and ideally predict and forestall them.</li>
</ol>
<h4>
Walk the walk</h4>
<div>
So do I have the ten commandments in place for all our production systems in my current work place? In part, we have for all our Python Django web applications (although some are a bit sparse in places - eg. monitoring, release management below the application layer). But our Java architecture only has packaged components, although work is being done for new Java Spring systems to provide automated build, ideally some tests and the need for monitoring is recognized. Hopefully we will tick all ten boxes for it too, eventually. So we will have as solidly maintainable a Java Spring platform as we have with our Python Django infrastructure.</div>
<div>
<br /></div>
<div>
However the concern is perhaps as much with all our legacy or outsourced systems integration code. These have none of these components and no realistic likelihood of getting them. Hence there is a huge support burden that results, diverting time away from providing them and leading to unreliable services. Add to that the problem of how platforms can be frozen, whilst still in use, as with our legacy Python (zope) architecture and then rot and lose the maintenance infrastructure that they had, (Our old CMS went live with half of the above features - now it has none) and the picture becomes a little bleak. Here the answer is perhaps to start to implement much more hard nosed rules wrt. to retiring systems, if they have replacements, whether or not those replacements fully cover the same functional space. Essentially this is a management, not a technical issue.</div>
<div>
<br /></div>
<div>
With a much reduced set of critical legacy systems and appropriate resourcing it would be possible to retrograde add the commandments to them, and bring all services up to a similar quality control.<br />
<br />
However the problem is greatly exacerbated by 'new' legacy bought in systems. So by this I mean third party supplier systems that we run and have to maintain (eg. regular upgrades, performance monitor etc.) that do not have most of the above features. Unfortunately something that appears true of all the smaller supplier's systems procured recently - ie. companies with under 10 core developers. Perhaps because most of them are providing products that actually are legacy ie. have not been written, or fully rewritten, in the last 6 years (for the full rant on this topic see the <a href="http://edcrewe.blogspot.co.uk/2013/11/the-ten-commandments-of-software.html">ten commandments of software procurement</a>!)<br />
<br /></div>
<h4>
* Fixing the legacy and external systems</h4>
<div>
There are plenty of configuration management and shell framework tools that can be applied to automate even the messiest old legacy systems. The key rule here is you don't need to write any of the infrastructure in the legacy code base. So use your standard CI server, shell framework and config management tools - don't add more procedural platform specific code (e.g. raw shell scripts).<br />
Modern automation tools should all be pretty platform independent - although if running Windows and Unix you may be better using a different shell framework for each, eg. Fabric and PowerShell, possibly the same for config management tools.<br />
<br />
If the code contains closed source compiled components with no versioning. Then the binaries can still be put into version control and release numbers assigned. At worst decompilation tools can be used - if there no other reasonable way to fix or replace the components.<br />
<br />
Similarly black box testing tools can be applied to any software, and if none of the technical team know what that code is doing - end users can provide a basic functional spec of what its meant to do, and these few basic stories used to create some minimal BDD tests.<br />
Data in / data out dumps and comparisons can also be used as a basis for manually maintained fixtures. Legacy components can be split up and packaging added to them ... but then much more work along this line of legacy code re-factoring and we start to raise the question of respecify / rewrite / replace being more cost effective.</div>
Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-82565961623462547192014-10-29T21:36:00.002+00:002014-10-29T21:44:30.413+00:00Fixing third party Django packages for Python 3With the release of Django 1.7 it could be argued that the balance has finally tipped towards Python 3 being its preferred platform. Well given Python 2.7 is the last 2.* then its probably time we all thought about moving to Python 3 for our Django deployments.<br />
<br />
Problem is those pesky third party package developers, because unless you are determined wheel reinventor (unlikely if you use Django!) - you are bound to have a range of third party eggs in your Django sites. As one of those pesky third party developers myself, it is about time I added Python 3 compatibility to my Django open source packages.<br />
<br />
There are a number of resources related to porting <a href="https://docs.python.org/3/howto/pyporting.html">Python from 2 to 3</a>, including specifically for <a href="https://docs.djangoproject.com/en/1.7/topics/python3/">Django</a>, but hopefully this post may still prove useful as a summarised approach for doing it for your Django projects or third party packages. Hopefully it isn't too much work and if you have been writing Python as long as me, it may also get you out of any legacy syntax habits you have.<br />
<br />
So lets get started, first thing is to set up Django 1.7 with Python 3<br />
For repeatable builds we want pip and virtualenv - if they are not there.<br />
For a linux platform such as Ubuntu you will have python3 installed as standard (although not yet the default python) so if you just add pip3 that lets you add the rest ...<br />
<br />
<h4>
Install Python 3 and Django for testing</h4>
<div>
<br /></div>
<code><span style="background-color: white;">sudo apt-get install python3-pip<br />(OR sudo easy_install3 pip)<br />sudo pip3 install virtualenv</span><br />
</code>
<br />
<br />
So now you can run virtualenv with python3 in addition to the default python (2.*)<br />
<code><br /></code>
<code>
virtualenv --python=python3 myenv3<br />
cd myenv3<br />
bin/pip install django<br />
</code><br />
<code><br /></code>
Then add a src directory for putting the egg in you want to make compatible with Python 3
so an example from git (of course you can do this as one pip line if the source is in git)
<code><br /><br />
</code><br />
<code>mkdir src</code><br />
<code>git clone https://github.com/django-pesky src/django-pesky <br />
bin/pip install -e src/django-pesky <br /><br />
</code><br />
Then run the django-pesky tests (assuming nobody uses an egg without any tests!)<br />
so the command to run pesky's test may be something like the following ...<br />
<br />
<code>
bin/django-admin.py test pesky.tests --settings=pesky.settings</code><br />
One rather disconcerting thing that you will notice with tests is that the default assertEqual message is truncated in Python 3 where it wasn't in Python 2 with a count of the missing characters in square brackets, eg.<br />
<code><br /></code>
<code>AssertionError: Lists differ: ['Failed to open file /home/jango/myenv/sr[85 chars]tem'] != []</code>
<br />
<code><br /></code>
<br />
<h4>
Common Python 2 to Python 3 errors</h4>
<div>
<br /></div>
And wait for those errors. The most common ones are:<br />
<br />
<ol>
<li>print statement without brackets</li>
<li>except Error as err (NOT except Error, err)</li>
<li>File open and file methods differ. <br />Text files require better quality encoding - so more files default to bytes because strings in Python 3 are all stored in unicode<br />(On the down side this may need more work for initial encoding clean up *,<br />but on the plus side functional errors due to bad encoding are less likely to occur)</li>
<li>There is no unicode() method in Python 3 since all strings are now unicode - ie. its become str() and hence strings no longer need the u'string' marker </li>
<li>Since unicode is not available as a method, it is not used for Django models default representation. Hence just using<br />def __str__(self):<br />
return self.name<br />is the future proofed method. I actually found that models with __unicode__ and __str__ methods may not return any representation, rather than the __str__ one being used, as one might assume, in Django 1.7 and Python 3</li>
<li>dictionary has_key has gone, must use in (if key in dict)</li>
</ol>
<br />
* I found more raw strings were treated as bytes by Python 3 and these then required raw_string.decode(charset) to avoid them going into the database string (eg. varchar) fields as pseudo-bytes, ie. strings that held 'élément' as '\xc3\xa9l\xc3\xa9ment' rather than bytes, ie. b'\xc3\xa9l\xc3\xa9ment'<br />
<div>
<br /></div>
Ideally you will want to maintain one version but keep it compatible with Python 2 and 3, <br />
since this is both less work and gets you into the habit of writing transitional Python :-)<br />
<br />
<h4>
Test the same code against Python 2 and 3</h4>
<div>
<br /></div>
So to do that you want to be running your tests with builds in both Pythons.<br />
So repeat the above but with virtualenv --python=python2 myenv2<br />
and just symlink the src/django-pesky to the Python 2 src folder.<br />
<br />
Now you can run the tests for both versions against the same egg code - <br />
and make sure when you fix for 3 you don't break for 2. <br />
<br />
For current Django 1.7 you would just need to support the latest Python 2.7 <br />
and so the above changes are all compatible except for use of unicode() and how you call open().<br />
<br />
<h4>
Version specific code</h4>
<div>
<br /></div>
However in some cases you may need to write code that is specific to 2 or 3. <br />
If that occurs you can either use the approach of latest or anything else (cross fingers)<br />
<br />
<code>
try:<br /> latest version compatible code (e.g. Python 3 - Django 1.7)<br />
except:<br /> older version compatible code (e.g. Python 2 - Django < 1.7)<br />
</code><br />
Or you can use specific version targetting ...<br />
<code><br /></code>
<code>
import sys, django<br />
django_version = django.get_version().split('.')<br />
<br />
if sys.version_info['major'] == 3 and django_version[1] == 7:<br /> latest version<br />
elif sys.version_info['major'] == 2 and django_version[1] == 6:<br /> older django version<br />
else:<br /> older version</code><br />
<code><br />
</code>
where ...<br />
<br />
django.get_version() -> '1.6' or '1.7.1'<br />
sys.version_info() -> {'major':3, 'minor':4, 'micro':0, 'releaselevel':'final', 'serial':0}<br />
<br />
<h4>
Summary</h4>
So how did I get on with my first egg, <a href="https://pypi.python.org/pypi/django-csvimport">django-csvimport</a> ? ... it actually proved quite time consuming since the csv.reader library was far more sensitive to bad character encoding in Python 3 and so a more thorough manual alternative had to be implemented for those important edge cases - which the tests are aimed to cover. After all if a CSV file is really well encoded and you already have a model for it - it hardly needs a pesky third party egg for CSV imports - just a few django shell lines using the csv library will do the job.<br />
<div>
<br /></div>
<br />
<br />
<br />
<br />
<br />
<br />Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-25142527392179050922014-07-03T09:38:00.001+01:002014-07-03T09:38:32.276+01:00Spring MVC setup on UbuntuRecently setting up Spring MVC on Ubuntu 14 with Netbeans wasn't entirely obvious for a newbie, so I thought I would document it in case it saved somebody 10 minutes!<br />
<br />
<br />
First install Apache and Tomcat, if you haven't got them already...<br />
<br />
sudo apt-get install apache2<br />
<br />
<br />
sudo apt-get install tomcat7 tomcat7-docs tomcat7-admin tomcat7-examples<br />
<br />
You should also have the default openjdk for tomcat and ant build tool and git <br />
<br />
sudo apt-get install default-jdk ant git<br />
<br />
Edit the tomcat-users.xml netbeans requires a user with the manager-script role<br />
(NOTE: you shouldn't give the same user all these roles in a production Tomcat!<br />
Also note that these manager roles have changed from Tomcat 6)<br />
<br />
sudo emacs /etc/tomcat7/tomcat-user.xml <br />
<br />
<br />
<pre style="-webkit-text-stroke-width: 0px; background-color: #eeeeee; border: 0px; color: black; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 17.804800033569336px; margin-bottom: 10px; margin-top: 0px; max-height: 600px; orphans: auto; overflow: auto; padding: 5px; text-align: start; text-indent: 0px; text-transform: none; vertical-align: baseline; white-space: pre-wrap; widows: auto; width: auto; word-spacing: 0px; word-wrap: normal;"><code style="border: 0px; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; margin: 0px; padding: 0px; vertical-align: baseline; white-space: inherit;"><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"><</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">tomcat</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">-</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">users</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">></span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">
</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"><</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">role rolename</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">=</span><span style="background-color: transparent; border: 0px; color: maroon; margin: 0px; padding: 0px; vertical-align: baseline;">"manager-gui"</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">/></span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">
</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"><</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">role rolename</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">=</span><span style="background-color: transparent; border: 0px; color: maroon; margin: 0px; padding: 0px; vertical-align: baseline;">"manager-script"</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">/></span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">
</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"><</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">role rolename</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">=</span><span style="background-color: transparent; border: 0px; color: maroon; margin: 0px; padding: 0px; vertical-align: baseline;">"manager-jmx"</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">/></span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">
</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"><</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">role rolename</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">=</span><span style="background-color: transparent; border: 0px; color: maroon; margin: 0px; padding: 0px; vertical-align: baseline;">"manager-status"</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">/></span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">
</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"><</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">role rolename</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">=</span><span style="background-color: transparent; border: 0px; color: maroon; margin: 0px; padding: 0px; vertical-align: baseline;">"admin-gui"</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">/></span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">
</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"><</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">role rolename</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">=</span><span style="background-color: transparent; border: 0px; color: maroon; margin: 0px; padding: 0px; vertical-align: baseline;">"admin-script"</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">/></span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">
</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"><</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">user username</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">=</span><span style="background-color: transparent; border: 0px; color: maroon; margin: 0px; padding: 0px; vertical-align: baseline;">"admin"</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"> password</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">=</span><span style="background-color: transparent; border: 0px; color: maroon; margin: 0px; padding: 0px; vertical-align: baseline;">"admin"</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"> roles</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">=</span><span style="background-color: transparent; border: 0px; color: maroon; margin: 0px; padding: 0px; vertical-align: baseline;">"manager-gui,manager-<wbr></wbr>script,manager-jmx,manager-<wbr></wbr>status,admin-gui,admin-script"</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"><wbr></wbr>/></span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">
</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;"></</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">tomcat</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">-</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">users</span><span style="background-color: transparent; border: 0px; margin: 0px; padding: 0px; vertical-align: baseline;">></span></code></pre>
<br />
<br />
Should restart tomcat after editing this ...<br />
<br />
sudo service tomcat7 restart<br />
<br />
Now you should be able to go to http://localhost:8080 and see<br />
<br />
<pre style="-webkit-text-stroke-width: 0px; background-color: #cceeee; border: 0px; color: black; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 17.804800033569336px; margin-bottom: 10px; margin-top: 0px; max-height: 600px; orphans: auto; overflow: auto; padding: 5px; text-align: start; text-indent: 0px; text-transform: none; vertical-align: baseline; white-space: pre-wrap; widows: auto; width: auto; word-spacing: 0px; word-wrap: normal;"><h1>
It works !</h1>
If you're seeing this page via a web browser, it means you've setup Tomcat successfully. Congratulations! ...
</pre>
<br />
Click on the link to the manager and get the management screen<br />
<br />
If the login fails - reinstall apache and tomcat - it worked for me!<br />
<br />
For Netbeans to find Apache OK you have to put the config directory where it expects it ...<br />
<br />
sudo ln -s /etc/tomcat7/ /usr/share/tomcat7/conf<br />
<br />
Note that the tomcat location, ie for the deploy directory is in<br />
<br />
/usr/var/lib/tomcat7<br />
<br />
Now install Netbeans, latest version is 8, either by download and install or <br />
<br />
sudo apt-get install netbeans<br />
<br />
Start up netbeans and go to Tools > Plugins<br />
<br />
Pick the Available plugins tab<br />
<br />
Search for web and tick Spring MVC - plus any others you fancy!<br />
<br />
Restart Netbeans<br />
<br />
Add a new project<br />
<br />
<ol style="-webkit-text-stroke-width: 0px; background-color: white; color: #333333; font-family: Arial, Helvetica, Verdana; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 19.799999237060547px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;">
<li style="list-style: decimal; margin-bottom: 9px; margin-top: 0px;">Choose New Project (Ctrl-Shift-N; ⌘-Shift-N on Mac) from the IDE's File menu. Select the Java Web category, then under Projects select Web Application. Click Next.</li>
<li style="list-style: decimal; margin-bottom: 9px; margin-top: 0px;">In Project Name, type in<span class="Apple-converted-space"> </span><b>HelloSpring</b>. Click Next.</li>
<li style="list-style: decimal; margin-bottom: 9px; margin-top: 0px;">Click the Add... button next to the server drop down </li>
<li style="list-style: decimal; margin-bottom: 9px; margin-top: 0px;"><div style="margin: 0px; padding: 0px 0px 3px;">
Select the Apache Tomcat or TomEE server in the Server list, click Next</div>
<div style="margin: 0px; padding: 0px 0px 3px;">
Enter Server Location: /usr/share/tomcat7 </div>
<div style="margin: 0px; padding: 0px 0px 3px;">
Enter the username and password from your tomcat-users.xml above and untick the create user box, if everything is working then it will accept this and add Tomcat to your server drop down list </div>
<div style="margin: 0px; padding: 0px 0px 3px;">
(it shouldn't need to try to add the user unless that user isn't already properly set up with the manager-script role in Tomcat) </div>
</li>
<li style="list-style: decimal; margin-bottom: 9px; margin-top: 0px;">In Step 4, the Frameworks panel, select Spring Web MVC.</li>
<li style="list-style: decimal; margin-bottom: 9px; margin-top: 0px;">Select<span class="Apple-converted-space"> </span><b>Spring Framework 3.x</b><span class="Apple-converted-space"> </span>in the Spring Library drop-down list.<span class="Apple-converted-space"> </span><br /><img alt="Spring Web MVC displayed in the Frameworks panel" class="margin-around b-all" src="https://netbeans.org/images_www/articles/80/web/spring/frameworks-window.png" style="border: 1px solid rgb(173, 173, 173); margin: 10px;" title="Spring Web MVC displayed in the Frameworks panel" /></li>
</ol>
<br />
Click Finish and you should have a skeleton Spring MVC project, pressing the Play button should build it and run it up, then launch your chosen browser with the home page of that project via the Apache Tomcat you have setup.<br />
Any changes should get auto-deployed and popped up in the browser again by pressing play.<br />
<br />
<br />Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-9006127125742839562014-05-02T12:14:00.002+01:002014-05-02T16:30:13.612+01:00Lessons learned from setting up a website on Amazon EC2I recently got involved with helping someone sort out their website on an Amazon EC2 instance, it had been a few years since I had the need to do anything with EC2, I realised that I was a novice in this world - and it raised a number of issues related to deploying to EC2 and performance.<br />
<br />
So I thought it may be useful to run through them for any other EC2 novices who are asked to do something similar, and want to learn from my rather blundering progress through this :-) <br />
<br />
Apologies for those of you are already well familiar with EC2 for covering some of the basics.<br />
<br />
The system <a href="http://moodpin.co.uk/">moodpin.co.uk</a> was based on a commercial PHP application, Pintastic.<br />
So this allows you to set up a site like <a href="http://pinterest.com/">pinterest.com</a> or <a href="http://wanelo.com/">wanelo.com </a><br />
These sort of sites are for creating subject specific photo sharing social media systems, so like Instagram, Picassa etc. but focussed around communities of shared (usually commerical) interest. For example buying shoes, interior decor etc.<br />
The common UI that they tend to present are big scrolling pages of submitted images related to topics for sharing, comment and discussion.<br />
<br />
So this system sends out a lot of notification emails, involves displaying hundreds of images per page - the visual pin board - and to help with performance has custom caching built in - triggered by cron jobs.<br />
<br />
Hence we have a number of cron jobs with the caching ones running every couple of minutes. To me this appeared a pretty crude caching mechanism - but my job was not to rewrite the application, but just tweak the code and get it all running OK.<br />
The code mainly uses a standard MVC approach like everything else these days!<br />
<br />
So demonstrating how outdated my knowledge of EC2 or this application were. I thought OK - first of all what platform is it. It was Amazon's own <a href="http://aws.amazon.com/amazon-linux-ami/2012.09-release-notes/">Linux </a>- this uses yum rather than apt for package installs so as distros go its perhaps more Redhat-like than Debian.<br />
<br />
For those unfamiliar with the basics - go to <a href="http://aws.amazon.com/">Amazon web services</a> and sign up!<br />
You can then choose to add some of the 40 odd different services that are available under the EC2 umbrella.<br />
<br />
Once you have signed up to a few of these, you get a management console that links to a control dashboard for each service. The first step usually being - the one with the computer instances on, EC2. From there you can pick an AMI (ie. operating system image), a zone - eg. <span style="-webkit-text-stroke-width: 0px; color: #444444; display: inline !important; float: none; font-family: 'Helvetica Neue', Roboto, Arial, 'Droid Sans', sans-serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: bold; letter-spacing: normal; line-height: 18.200000762939453px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;">US West (Oregon)</span> and use it to create a new instance. Add an SSH key pair for shell access and then fire it up and download the pem file so you can ssh into your new Amazon box.<br />
<br />
So the client wanted the usual little tweaks to PHP code, CSS tweaking - so easy stuff its just web development ... done in a jiffy (well after digging through the MVC layers, templating language, cache issues and CSS inheritance etc. for a fairly complex PHP app you have never come across before, when PHP is not exactly your favourite language ... jiffyish maybe) <br />
Then we got to the more SysAdmin related requests ... lets just say I probably shouldn't rush out and buy a DevOps tee-shirt just yet ...<br />
<br />
<b>'Get email working</b>'<br />
<br />
<ol>
<li>Try to send an email from the web application - write a plain PHP script that just sends a test email - just run mail from the linux command line ... Got it there is no MTA installed! </li>
<li>Install an MTA - sendmail. Go back up that stack of actions and they are all working ... hurray that was easy.</li>
<li>A week or so later ... 'emails stopped working'</li>
<li>Go back to step 1. and yep - emails stopped working</li>
<li>Look at the mail logs and see what the problem is.</li>
<li>Realise that there are masses of emails being sent out ... but all of it is bouncing back as unverified.</li>
<li>Think ... wow that pintastic site's notifier is busy - must be getting lots of traffic *</li>
<li>So why has Amazon started bouncing all the email?</li>
<li>Search Amazon's docs. Amazon has a very minimal test quota allowed for email. Once that quota is filled, unverified email will be blocked.</li>
<li>Amazon has historically been one of the main sources of SPAM machines, that history means that it has to set up a much more elaborate mechanism for validating email that most hosting companies, and it no longer allows direct emailing from EC2 boxes (apart from minimal test quotas)</li>
<li>So what we need to do is set up our mail to be sent via the Amazon SES service - add SES service and enable it</li>
<li>So now we need to send authorised emails to the Amazon SES gateway that will then forward them on to the outside world</li>
<li>Try to get sendmail to send authenticated emails, follow guide but it continues to bounce with authentication failure, give up and install <a href="http://docs.aws.amazon.com/ses/latest/DeveloperGuide/postfix.html">postfix</a>, follow the 20 steps of setting up the SASL password etc., and eventually it doesn't bounce with authentication errors - hurray!</li>
<li>But the email still bounces. So we need to verify all our sending email addresses - managed by the SES console - or use DKIM to get the whole domain verified and signed from which we are sending.</li>
<li>Modify the emails used by the sending software to ones which we can receive and validate - send and validate them. Our emails are working again.</li>
<li>Leave it a few days, we are not sending email anymore, boooo!</li>
<li>Check all the SES documentation, surprise, surprise SES also has quota limits for test level only, and you have to formally apply to get those limits lifted.</li>
<li>Contact the client and get him to make a formal request for quota lifting on his account.</li>
<li>*As part of the investigation check that email log a little more closely, it seems rather large, and we seem to be using up our quotas really quickly ... ah the default setup for unix cron sends an email for every job that returns text. The pintastic cache job returns text, so we are sending a pointless email every two minutes ... or trying to ... whoops. Make sure no cron or other unix system command is acting as a SPAM bot. </li>
<li>A few days later - Amazon say our quota has been lifted</li>
<li>Our emails have started sending again ... and they are still sending today !!!</li>
</ol>
Clients response, OK thanks, by the way since we added all the start up data / ie. uploaded images, the site takes at least two minutes to render the home page - or times out altogether. <br />
Hmmm I did kinda notice that ... but hey he hadn't asked me to make the site actually usable speed wise ... until now!<br />
<br />
<b>'Why is the site, really, really slow?'</b><br />
<br />
<br />
Hmmm wow it really is slow, lots of the time it just dies, that PHP cache thingy can't be doing much, so whats the problem.<br />
<br />
<ol>
<li>Lets look at the web site, wow it takes 5 minutes for the page to come back ... so this isnt exactly Apache bench territory ... run up a few tabs looking at the home page ... and it starts just returning server timeouts. </li>
<li>So whats happening on the server ... whats killing the box ... top tells us that its Apache killing us here - with 50 odd processes spawning and sucking up all memory and CPU.</li>
<li>So we check out our Apache config and its the usual PHP orientated config of MPM prefork. But what are the values set ... they are for a great big multiprocessor cadillac of a machine, whilst ours is more of a smart car in its scale. </li>
<li>Lesson is that Amazon AMI's are certainly not smart enough to have different image configs for different hardware specs of instances they provide. So it appears they default their configs to suiting the top of the range instances (since I guess they cost the most). If you have a minimal hardware spec box ... you should reconfigure hardware related parameters for the software you run on it ... or potentially it will fail.</li>
<li>Slash all those servers, clients etc. values to the number of servers and processes the server can actually deliver. Slightly trial and error here ... but eventually we got MaxClients 30 instead of 500 etc. and give it a huge timeout.<br /><br /><IfModule prefork.c><br />
StartServers 4<br />
MinSpareServers 2<br />
MaxSpareServers 10<br />
ServerLimit 30<br />
MaxClients 30<br />
MaxRequestsPerChild 4000<br />
</IfModule></li>
<li>Now lets hammer our site again ... hurray it doesn't completely fall over ... one day it may return a page, but its horribly horribly slow still ie. 3 minutes absolute top speed - further home page requests the slower they get.</li>
<li>So lets get some stat.s, access the page with browser web dev network tools.
Whats taking the time here. Hmmm web page a second, not great but
acceptable, JS and CSS 0.25 sec, OK. Images hmmm images hmmm for the
home page particularly ... 3-6 minutes ... so basically unusable. </li>
<li>So time to bite the bullet we know Apache can be slower at serving static pages if its not optimised for it - especially if resources are limited (its processes have a bigger memory overhead), thats why the Apache foundation has another web server, <a href="http://trafficserver.apache.org/">Apache Trafficserver</a> , for that job</li>
<li>But whats the standard static server (thats grabbed half of Apache's share of the web in the last few years), yep <a href="http://nginx.com/">nginx</a> </li>
<li>So lets set up the front end of our site as nginx acting as a reverse proxy to Apache just doing the PHP work, with nginx serving all images. So modify Apache to just serve on 8080 on localhost and flip the site over to an nginx front end, with the following nginx conf ...<br /><br />server { listen 80;<br />
server_name moodpin.co.uk;<br />
<br />
location ^~ /(cache|cms|uploads) {<br />
root /var/www/html/;<br />
expires 7d;<br />
access_log /var/log/nginx/d-a.direct.log ;<br />
}<br />
<br />
location ~* \.(css|rdf|xml|ico|txt|gif|jpg|png|jpeg)$ {<br />
expires 365d;<br />
root /var/www/html/;<br />
access_log /var/log/nginx/d-a.direct.log ;<br />
}<br />
<br />
location / {<br />
proxy_pass http://127.0.0.1:8080/;<br />
<br />
Wow, wow, so take that 3-6 minutes and replace it with 1-2 seconds. </li>
<li>So how many images on the home page - about 150 plus more with
scrolling ... so that means we have a site that is on average under 0.5%
dynamic code driven content and 99.5% static content/requests per page.<br />That is a very very static site - hence the 100 x faster speed!</li>
<li>So there you go client take that souped up smart car and go </li>
<li>Client replies ... ummm sites down - server proxy timeout error</li>
<li>Go to Google and check, so we have to make sure that nginx has timeout settings greater than Apache's - and nginx default timeout is 60 seconds</li>
<li>Make nginx _timeout settings into 10 minutes ... sounds bad, try the site, and it consistently delivers pages in 3 seconds or so assume that the scrolling request update page nature of the app, makes the timeout required much longer than the apparent time Apache is delivering PHP within?</li>
<li>Show the client again, hes happy.</li>
<li>Few days later ... this bit of the sites not working now</li>
<li>Check the code, discover that there is a handful of javascript files used by the system that are not really static - they are PHP templates generating javascript that appear static. Remove js file types from the list of files above in the nginx config. Hurray generated javascript served from Apache PHP now. Bit of site works again</li>
<li>OK we are done ... don't run Apache bench against the site ... if the client actually gets any users and it cant' cope - tell him to upgrade his instance.</li>
</ol>
<br />
<ol></ol>
I hope my tails of devops debuggery are useful to you, Bye!<br />
Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-60542736321975404192014-01-13T21:58:00.003+00:002014-01-13T21:58:53.233+00:00Postgres character set conversion woesI had to struggle with sorting out some badly encoded data in Postgresql over the last day or so.<br />
This proved considerably more hassle than I expected, partly due to my ignorance of the correct syntax to use to convert textual data.<br />
<br />
So on that basis I thought I would share my pain!<br />
<br />
There are a number of issues with character sets in relational databases.<br />
<br />
For a Postgres database the common answers often relate to fixing the <a href="http://www.postgresql.org/docs/9.2/static/multibyte.html">encoding of the whole database</a>. So if this is the problem the fixes are often just a matter of setting your client encoding to match that of the database. Or to dump the database then create a new one with the correct encoding set, and reload the dump.<br />
<br />
However there are cases where the encoding is only problematic for certain fields in the database, or where you are creating views via database links between two live databases of different encodings - and so need to fix the encoding on the fly via these views.<br />
<br />
Ideally you have two databases that are both correctly encoded, but just use different encodings.<br />
If this is the case you can just use <a href="http://www.postgresql.org/docs/9.1/static/functions-string.html">convert(data, 'encoding1', 'encoding2')</a> for the relevant fields in the view.<br />
<br />
Then you come to the sort of case I was dealing with. Where the encoding is too mashed for this to work. So where strings have been pushed in as raw byte formats that either don't relate to any proper encoding, or use different encodings in the same field.<br />
<br />
In these cases any attempt to run a convert encoding function will fail, because there is no consistent 'encoding1'<br />
<br />
The symptoms of such data is that it fails to display. So is sometimes its difficult to notice until<br />
the system / programming language that is accessing the data throws encoding errors.<br />
In my case the <a href="http://www.pgadmin.org/">pgAdmin</a> client failed to display the whole field so although the field appears blank, matches with like '%ok characs%' or length(field) still work OK. Whilst the normal psql command displayed all the characters except for the problem ones, which were just missing from the string.<br />
<br />
This problem has two solutions:<br />
<br />
1. Repeat the dump and rebuild approach with the correct encoding, but to write a custom script in Perl, Python or the like to fix the mashed encoding - assuming that the mashing is not so entirely random as to be fixable via an automated script*. If it isn't - then you either have to detect and chuck away bad data - or manually fix things!<br />
<br />
2. Fix the problem fields via pl/sql, pl/python or pl/perl functions that process these to replace known problem characters in the data.<br />
<br />
I chose to use pl/sql since I had a limited set of these problem characters, so didn't need the full functionality of Python or Perl. However in order for pl/sql to be able to handle the characters for fixing, I did need to turn the problem fields into raw byte format.<br />
<br />
I found that the conversion back and forth to bytea was not well documented, although the built in functions to do so were relatively straight forward...<br />
<br />
Text to Byte conversion => <b>text_field::bytea</b><br />
<br />
Byte to Text conversion => <b>encode(text_field::bytea, 'escape')</b><br />
<br />
So employing these for fixing the freaky characters that were used in place of escaping quotes in my source data ...<br />
<br />
CREATE OR REPLACE FUNCTION encode_utf8(text)<br />
RETURNS text AS<br />
$BODY$<br />
DECLARE<br />
encoding TEXT;<br />
BEGIN<br />
-- single quote as superscript a underline and Yen characters <br />
<br />
IF position('\xaa'::bytea in $1::TEXT::BYTEA) > 0 THEN<br />
RETURN encode(overlay($1::TEXT::BYTEA placing E'\x27'::bytea from position('\xaa'::bytea in $1::TEXT::BYTEA) for 1), 'escape');<br />
END IF;<br />
<br />
-- double quote as capital angstroms character <br />
IF position('\xa5'::bytea in $1::TEXT::BYTEA) > 0 THEN<br />
RETURN encode(overlay($1::TEXT::BYTEA placing E'\x22'::bytea from position('\xa5'::bytea in $1::TEXT::BYTEA) for 1), 'escape');<br />
END IF;<br />
RETURN $1;<br />
END;<br />
$BODY$<br />
<br />
Unfortunately the Postgres <a href="http://www.postgresql.org/docs/current/static/functions-binarystring.html">byte string functions</a> don't include an equivalent to a string replace and the above function assumes just one problem character per field (my use case), but it could be adapted to loop through each character and fix it via use of overlay.<br />
So the function above allows for dynamic data fixing of improperly encoded text in views from a legacy database that is still in use - via a database link to a current UTF8 database.<br />
<br />
* For example in Python you could employ <a href="https://pypi.python.org/pypi/chardet">chardet</a> to autodetect possible encoding and apply conversions per field (or even per character)<br />
<br />Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-89652388416302007092014-01-06T14:47:00.002+00:002014-01-06T16:25:56.106+00:00WSGI functional benchmark for a Django Survey ApplicationI am currently involved in the redevelopment of a <a href="http://www.survey.bris.ac.uk/">survey creation tool</a>, that is used by most of the UK University sector. The application is being redeveloped in Django, creating surveys in Postgresql and writing the completed survey data to Cassandra.<br />
The core performance bottleneck is likely to be the number of concurrent users who can simultaneously complete surveys. As part of the test tool suite we have created a custom Django command that uses a browser robot to complete any survey with dummy data.<br />
I realised when commencing this WSGI performance investigation that this functional testing tool could be adapted to act as a load testing tool.<br />
So rather than just getting general request statistics - I could get much more relevant survey completion load data.<br />
<br />
There are a number of more thorough benchmark posts of raw pages using a wider range of WSGI servers - eg. <a href="http://nichol.as/benchmark-of-python-web-servers">http://nichol.as/benchmark-of-python-web-servers</a> , however they do not focus so much on the most common ones that serve Django applications, or address the configuration details of those servers. So though less thorough, I hope this post is also of use.<br />
<br />
The standard configuration to run Django in production is the dual web server set up. In fact Django is pretty much designed to be run that way, with contrib apps such as <a href="https://docs.djangoproject.com/en/dev/ref/contrib/staticfiles/">static files</a> provided to collect images, javascript, etc. for serving separately to the code. Recognizing that in production a web server optimized for serving static files is going to be very different from one optimized for a language runtime environment, even if they are the same web server, eg. Apache. So ideally it would be delivered via two differently configured, separate server Apaches. A fast and light static configured Apache on high I/O hardware, and a mod_wsgi configured Apache on large memory hardware. In practise Nginx may be easier to configure for static serving, or for a larger globally used app, perhaps a CDN.<br />
This is no different from optimising any web application runtime, such as Java Tomcat. Separate static file serving always offers superior performance.<br />
<br />
However these survey completion tests, are not testing static serving, simpler load tests suffice for that purpose. They are testing the WSGI runtime performance for a particular Django application.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6lYXU9Uh9vOsxvEQKsT059ls4nh_inNz-d-01PfjfBFgHYhyphenhyphenhvTlx9XZX0i8liGmUC5oAqbXltdZ_o6gB2QKc2UHPxpOqNhO3PODUhj1JBD8g9SVQObvjQ11hyz19GADudvRjQ0md043j/s1600/chart_1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6lYXU9Uh9vOsxvEQKsT059ls4nh_inNz-d-01PfjfBFgHYhyphenhyphenhvTlx9XZX0i8liGmUC5oAqbXltdZ_o6gB2QKc2UHPxpOqNhO3PODUhj1JBD8g9SVQObvjQ11hyz19GADudvRjQ0md043j/s1600/chart_1.png" /></a></div>
<h3>
Conclusions</h3>
Well you can draw your own, for what load you require, of a given set hardware resource! You could of course just upgrade your hardware :-) <br />
<br />
However clearly uWSGI is best for consistent performance at high loads, but<br />
Apache MPM worker outperforms it when the load is not so high. This is likely to be due to the slightly higher memory per thread that Apache uses compared to uWSGI<br />
<br />
Using the default Apache MPM process may be OK, but can make you much more open to DOS attacks, via a nasty performance brick wall. Whilst daemon mode may result in more timeout fails as overloading occurs. <br />
<br />
Gunicorn is all Python so easier to set up for multiple django projects on the same hardware, and performs consistently across different loads, if not quite as fast overall. <br />
<br />
I also tried a couple of other python web servers, eg. tornado, but the best I could get was over twice as slow as these three servers, they may well have been configured incorrectly, or be less suited to Django, either way I did not pursue them.<br />
<br />
Oh and what will we use?<br />
<br />
Well probably Apache MPM worker will do the trick for us, with a separate proxy front-end Apache configured for static file serving.<br />
At least that way, its all the same server that we need to support, and one that we are already well experienced in. Also our static file demands are unlikely to be sufficient to warrant use of Nginx or a CDN.<br />
<br />
I hope that these tests may help you, if not make a decision, maybe at least decide to try out testing a few WSGI servers and configs, for yourself. Let me know if your results differ widely from mine. Especially if there are some vital performance related configuration options I missed!<br />
<br />
<h3>
Running the functional load test </h3>
To run the survey completion tool via number of concurrent users and collect stat.s on this, I wrapped it up in test scripts for <a href="http://locust.io/">locust</a>. <br />
<br />
So each user completes one each of seven test surveys.<br />
The locust server can then be handed the number of concurrent users to test with and the test run fired off for 5 minutes, over which time around 3-4000 surveys are completed.<br />
<br />
The number of concurrent users tested with was 10, 50 and 100<br />
With our current traffic peak loads will probably be around the 20 users mark with averages of 5 to 10 users. However there are occasional peaks higher than that. Ideally with the new system we will start to see higher traffic, where the 100 benchmark may be of more relevance.<br />
<br />
<h3>
Fails</h3>
A number of bad configs for the servers produced a lot of fails, but with a good config these seem to be very low. So all 3 x 5 minute test runs for each setup created around 10,000 surveys, these are the actual number of fails in 10,000<br />
so insignificant perhaps ...<br />
<br />
Apache MPM process = 1<br />
Apache MPM worker = 0<br />
Apache Daemon = 4<br />
uWSGI = 0<br />
Gunicorn = 1<br />
<br />
(so the fastest two configs both had no fails, because neither ever timed out)<br />
<br />
<h3>
Configurations</h3>
<div>
The test servers were run on the same virtual machine, the spec of which was<br />
a 4 x Intel 2.4 GHz CPU machine with 4Gb RAM<br />
So optimum workers / processes = 2 * CPUs + 1= 9<br />
<br />
The following configurations were arrived at by tinkering with the settings for each server until optimal speed was achieved for 10 concurrent users.<br />
Clearly this empirical approach may result in very different settings for your hardware, but at least it gives some idea of the appropriate settings - for a certain CPU / memory spec. server.<br />
<br />
For Apache I found things such as WSGIApplicationGroup being set or not was important, hence its inclusion, with a 20% improvement when on for MPM prefork or daemon mode, or off for MPM worker mode.</div>
<div>
<h4>
Apache mod_wsgi prefork</h4>
WSGIScriptAlias / /virtualenv/bin/django.wsgi<br />
WSGIApplicationGroup %{GLOBAL}</div>
<div>
<h4>
Apache mod_wsgi worker</h4>
WSGIScriptAlias / /virtualenv/bin/django.wsgi<br />
<br />
<IfModule mpm_worker_module><br />
# ThreadLimit 1000 <br />
StartServers 10<br />
ServerLimit 16<br />
MaxClients 400<br />
MinSpareThreads 25<br />
MaxSpareThreads 375<br />
ThreadsPerChild 25<br />
MaxRequestsPerChild 0<br />
</IfModule>
</div>
<div>
<h4>
Apache mod_wsgi daemon</h4>
WSGIScriptAlias / /virtualenv/bin/django.wsgi<br />
WSGIApplicationGroup %{GLOBAL}<br />
<br />
WSGIDaemonProcess testwsgi \<br />
python-path=/virtualenv/lib/python2.7/site-packages \<br />
user=testwsgi group=testwsgi \<br />
processes=9 threads=25 umask=0002 \<br />
home=/usr/local/projects/testwsgi/WWW \<br />
maximum-requests=0<br />
<br />
WSGIProcessGroup testwsgi
</div>
<div>
<h4>
uWSGI</h4>
uwsgi --http :8000 --wsgi-file wsgi.py --chdir /virtualenv/bin \<br />
--workers=9 --buffer-size=16384 --disable-logging<br />
<br />
<br />
<h4>
Gunicorn</h4>
django-admin.py run_gunicorn -b :8000 --workers=9 --keep-alive=5
</div>
<br />
<br />Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-16766578749787678792013-11-21T09:58:00.000+00:002013-11-21T10:11:16.790+00:00Django Cardiff User GroupLast night I went to the <a href="http://www.eventbrite.co.uk/e/django-cardiff-user-group-meeting-registration-9205751651">second meeting</a> of the Django Cardiff User Group.<br />
<br />
This is a sister group to the <a href="https://groups.google.com/forum/#!forum/dbug">DBBUG</a> Bristol based one that I have been attending for the last 5 years. It was organised by Daniele Procida, who started attending DBBUG events a few years ago and has now decided to spread the word over the Severn, in Wales.<br />
<br />
He is also organising the first UK Django conference in a couple of months, <a href="https://djangoweekend.org/">https://djangoweekend.org/</a> so its good to see one open source / Python group be inspiration for spawning another, and one that is perhaps more organisationally active than its progenitor.<br />
<br />
The evening was fun, and it was good to meet and chat with Djangonauts over the border.<br />
<br />
Andrew Godwin, Django core developer / release manager, gave us an update on all the new goodies to be added in <a href="https://docs.djangoproject.com/en/dev/releases/1.7/">Django 1.7</a><br />
So this release is largely about really sorting out the niggling issues with relational database features, and the low level ORM handling of them.<br />
It sees rationalisation of transaction handling with the use of nestable atomic statements, addition of generic connection pooling, and handling of composite keys.<br />
<br />
Daniele demonstrated how to fly a helicopter (a toy one) via the Python command line, although Andrew seemed rather more adept at landing it safely. I gave a little reprise of <a href="http://bit.ly/megadj2013">a talk</a> introducing DBBUG and how a developer can follow the road to their own open source contributions.<br />
<br />
Thanks to everyone involved, I hope to get to the Django weekend too.Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-13968749805640566012013-11-21T09:25:00.003+00:002013-11-28T15:25:10.710+00:00The ten commandments of software procurementFor a medium to large scale organisation with its own IT department, I have found in today's market the following truths for software procurement apply. Yet they are usually poorly understood by staff in organisations outside the software sector. Who often view the world through antique pre1990 glasses, before the significant impact of web based providers, and the mixed economy of revenue models of modern software companies ...<br />
<ol>
<li>Software is like any other creative output, it differs radically in quality, modernity and appropriateness - and this is entirely unrelated to its cost. Partly because the majority of today's leading software development companies
are internet companies who do not use software charging for revenue. </li>
<li>So whether or not software is charged for directly via a licensing model is unrelated to whether it is mostly open source or closed source / commercial. Some software is no longer purchasable or the paid for solutions are too poor quality to be viable, compared to the free ones. In such cases other non-financial trading decisions must be part of the procurement arsenal. So policies on data release, etc.</li>
<li>Whether something is open or closed source is entirely irrelevant to its quality, scalability or any other attribute you care to name. These days any software stack is likely to be a mix of both.<br />However given source, tests, community and commit rate can all be checked for the former, it is far easier not to pick a lemon, with open source (not that a non-technical organisation tends to use any of these core indicators for procurement assessment).</li>
<li>Software is basically like literature - there are your Barbara Cartland's and your Shakespeare's - unfortunately less people are able to read it to work out what quality it is, so its a book which is generally just judged by its cover - hence the common misconception that software is all roughly the same - or that its quality relates to its cost.</li>
<li>However, the more generic a software application is, the more likely it is that you get better quality for a lower cost - standard economy of scale. <br />Hence Google GMail / Microsoft Office / open source Apache - are good quality - because they are large scale generic applications. <br />The more specific an application is, the more likely the software (whether open source or commercial) will have been put together by a core group of at most 3 or 4 developers, hence have less quality control methods applied, be more buggy and risk being generally of a lower standard.</li>
<li>If the IT Services department of your organisation is not sufficiently powerful enough to tell the users what they are going to get, despite what they want. It is common that many systems it deploys will require significant customisation, the more specific they are, the more the customisation.<br />Customisation of out sourced, closed source products is likely to incur significantly greater time and development cost than open source ones. Whether customised in house or out sourced. If customised in house then unless the software has a well designed API, docs etc. - ie is a widely used generic system from a major company. You usually find that you can only do black box integration and wrapper coding or resort to breaking license agreements by decompiling. All of which is difficult to maintain.<br />If out sourced, then the code may be open, test suited and documented within the supplying company, but you are likely to be paying around 3 times the wage to the company, than your inhouse cost, for a junior developers customisation / bug fixing time.</li>
<li>Due to historical reasons some types of software have far superior products that are all in one of these camps than the other ... So open source finance software is poor. Closed source web CMS software and repository software is poor, etc.</li>
<li>Non-technical companies will go through a 5-10 year cycle of outsourcing as much software as possible, then auditing consultancy costs, then ballooning internal development to cut costs, then deciding too much development is in house back to outsourcing again. This cycle wastes a lot of money due to its lack of understanding of the benefits of a stable highly selective mixed economy for software of outsourced, open source, commercial and in-house as being the ideal balance of functionality vs. cost.</li>
<li>Buying mix and match products from integrated product suites is a recipe for high cost, eg. MS Exchange Email and Google Docs, rather than all from one or other supplier.</li>
<li>Lastly and most importantly a non-technical organisation always makes its software procurement decisions based on political reasons*. Never on technical ones. This invariably means that it makes decisions that are significantly more costly, difficult to maintain and less well featured than it could achieve using a purely technical assessment process.<br />Usually they will also fail to have processes to properly trial run alternative products in a realistic manner, or to audit selections once the initial purchase is made. This may partly be because although auditing may save significant costs in the long run, it does introduce a means by which a wrong choice can be flagged up. Unfortunately it is often less embarrassing to make do with a bad choice, until its end of life, than admit a failure. Even though failing and acceptance of it as part of the process, is essential to delivery of quality (rather than make do) systems. </li>
</ol>
<div>
<br /></div>
<div>
Thank you ... rant over :-)<br />
<br />
* political reasons - The salesman managed to persuade someone suitably senior that they were technically clueless enough to believe them. This usually goes in tandem with, company software team response ... the salesman promised them it did what?? ... make damn sure that isn't in the contract / licensing agreement.</div>
<br />
<br />
<br />
<br />Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-9142360042474751252013-06-03T09:44:00.000+01:002013-06-15T11:48:57.546+01:00IT MegameetYes MegaMeet may have a slightly cheesey ring to it, but the Bristol <a href="http://bristol.itmegameet.co.uk/">IT MegaMeet</a> was a lot of fun, and a great idea for a regional software community event. So unlike most conferences this one is not for a particular company, language, platform or area of software expertise. Instead it brings together all the voluntary community software and technology groups within the region of Bristol, UK.<br />
<br />
There are quite a number as it turns out, and so squeezing the conference into a single day resulted in 5 tracks. For a conference organised for the first time last year by a student to save his course - thanks Lyle Hopkins, it rather put our local University's official efforts in software community engagement to shame - however perhaps it might encourage them to rise to the challenge. (Lyle is a student at one, and I work at the other.)<br />
<br />
So of the perhaps 30-40 software groups that are based in and around Bristol, over 20 were represented, a good turnout partly due to the efforts of one of Lyle's fellow volunteers, Indu Kaila, to do the leg work of attending all the local events and getting various members (like myself) to volunteer to represent their group at the event. So I am one of the hundred odd members of Bristol and Bath's Django User Group (DBBUG), started by Dan Fairs, and <a href="http://bit.ly/megadj2013">did a presentation</a> about Python, Django, our group, and the process of contributing to open source - so rather a lot to pack into 40 minutes, but it seemed to go down OK.<br />
<br />
There was the full range of enthusiast groups present, so I started the day finding out how the four colour theorem from map making applies to optimisation algorithms used in compilers, from the ACCU, who have been around for a very long time, starting out as a C programming community group. Then near the finish saw a good talk from Bristol Web folk reminding me about the core important issues to remember concerning front end web development - as more of a back end developer it can be easy to label this stuff somebody else's job, but with an ever increasing slice of the stack being client side in web development, these days, that is clearly a bad attitude.<br />
<br />
There was more than a smattering of javascript related talks going on, from big data CouchDB and node.js back end use, through to more client side, and a very popular session, flying helicopters via javascript code.<br />
<br />
The talks were rounded off with some talks about the charity cause that the day was helping to raise funds for, a <a href="http://www.insfriends.org.uk/">cross atlantic row</a> in aid of cervical cancer charity (plus an appeal for graphic design work for another member of the volunteer team from the Ukraine, who is in need of health care).<br />
<br />
I then found myself in the rather comical position of receiving two awards from the extensive award ceremony for community involvement, etc. Both really on behalf of other people, but it was fun and lead on to the free bar and barbecue, always a popular way to round off a conference.<br />
<br />
So thanks to the Megameet team, if nobody else comes forward, I can always represent <a href="https://groups.google.com/forum/?fromgroups=#!forum/dbug">DBBUG</a>, <a href="http://www.meetup.com/south-west-big-data/">South West Big Data</a> or perhaps another new local group, again next year!<br />
<br />
<br />Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-17988626052542365132012-11-15T01:13:00.001+00:002012-11-15T01:14:25.579+00:00Cookie law, Cookieless and django tips.<h3>
django-cookieless</h3>
Last week I released a new add on for django, <a href="http://pypi.python.org/pypi/django-cookieless">django-cookieless</a>, it was a relatively small feature that was required for a current project, and since it was such a generic package seemed ideal for open sourcing as a separate egg. It made me realise that I hadn't released a new open source package for well over a year, and so this one is certainly long over due in that sense.<br />
<br />
<h3>
Cookie Law</h3>
It also over due in another sense, <a href="http://www.ico.gov.uk/">EU Cookie law</a> has been in force since May 2011, so legally any sites that are used in Europe, and set cookies which are not strictly necessary for the functioning of the site, must now request permission of the user before doing so. Of course it remains to be seen if any practical enforcement measures will happen, although they were due to this summer in the UK, for example. Hence many of the first rush of JavaScript pop up style solutions, have come and gone, as a result of user confusion. But for public sector clients particularly, it is certainly simpler to just not use cookies, if they are not technically required. Also it may at least, make developers rather less blasé about setting cookies.<br />
<br />
Certainly most people would prefer not to see their browsers filled with deliberate user tracking and privacy invasive cookies that are entirely unrelated to the sites functionality. In the same way most of us don't like being tracked by CCTV everywhere we go ... unfortunately, the current Law doesn't have a good technical solution behind it, hence it may well founder over time. This is because cookie control is too esoteric for ordinary users, and even with easy browser based privacy levels configuration, any technical solutions are problematic, because a single cookie can be used to both protect privacy (in terms of security - e.g. a CSRF token) and invade it. It is entirely down to the specific applications usage of it, where these distinctions lie. Invasive methods can also be implemented via other session maintenance tools, such as URL rewriting, yet because no data is written to the users browser, it is outside the remit of this Law, so the Law makes little sense currently, and may well be unenforceable.<br />
<br />
Perhaps it would of been better to aim to set laws related to encouraging adherence to set standards of user tracking, starting with compliance with the browser added 'Do Not Track' header, perhaps adding some more subtle gradations over time. With the targets of the Law, being companies whose core business is user tracking for advertising sales etc., starting with Google and working down. Rather than pushing the least transgressive public service sector, as the most likely to comply, to add a bunch of annoying 'Will you accept our cookies?' pop ups.<br />
<br />
However even if this law dries up and blows away, for our particular purposes, we needed django to cater for any number of sessions per browser (as well as not using cookies for anonymous users).<br />
Django's default session machinery requires cookies, so ties a browser to a single session - request.session set against a cookie. But because django-cookieless provides sessions maintainable by form posts, it automatically delivers multiple sessions per browser.<br />
<br />
There are a number of security implications with not using cookies, which revolve around the difficulty of preventing session stealing without them. Given this is the case, django-cookieless has a range of settings to reduce that risk, but even so I wouldn't recommend using it for sessions that are tied to authenticated users, and hence could lead to privilege escalation, if the session were stolen.<br />
<br />
<h3>
Django Tips</h3>
I thought the egg would be done in a day, but in reality it took a few days, due to a number of iterations that were necessary as I discovered a series of features around the lesser known (well to me) parts of django. So I thought I would share these below, in case, any of the tips I gained are useful ...<br />
<br />
<ol>
<li>The request object life cycle goes through three main states in django:<br /><b>unpopulated</b> - the request that is around at the time of process_request type middleware hooks - before it gets passed by the URL handler to decorators and then views.<br /><b>partly populated</b> - the request that has session, user and other data added to it (mainly by decorators) and gets passed to a view<br /><b>fully populated</b> - the request that has been passed through the view to add its data, and is used to generate a response - this is the one that process_response sees.</li>
<li>I needed to identify requests that were decorated with my no_cookies decorator at the time of process_request. But the flag it sets has not be set yet. However there is a useful utility to work around this, django.core.urlresolvers.resolve, which when passed a path, gives a match object containing the view function to be used, and hence its decorators, if any.</li>
<li>Template Tags that use a request get the unpopulated one by default. I needed request to have the session populated for the option of adding manual session tags - see the <a href="https://github.com/edcrewe/django-cookieless/blob/master/cookieless/templatetags/cookieless_tags.py">tags code</a>, to have the partly populated request in tags <span style="background-color: white; color: #dd1144; font-family: Consolas, 'Liberation Mono', Courier, monospace; font-size: 12px; line-height: 16px; white-space: pre;">django.core.context_processors.request </span><span style="background-color: white; font-family: Consolas, 'Liberation Mono', Courier, monospace; font-size: 12px; line-height: 16px; white-space: pre;">must be added to the</span><span style="background-color: white; color: #dd1144; font-family: Consolas, 'Liberation Mono', Courier, monospace; font-size: 12px; line-height: 16px; white-space: pre;"> TEMPLATE_CONTEXT_PROCESSORS </span> in settings.<br /><br />
</li>
<li>The django test framework's test browser is in effect a complex mocking tool to mock up the action of a real browser, however like any mock objects - it may not exactly replicate the behaviour one desires. In my case it only turns on session mocking if it finds the standard django session middleware in settings. In the case of cookieless it isn't there, because cookieless acts as a replacement for it, and a wrapper to use it for views undecorated with no_cookies. Hence I needed to <a href="https://github.com/edcrewe/django-cookieless/blob/master/cookieless/decorators.py">use a trick</a> to set a TESTING flag in settings - to allow for flipping cookieless on and off.</li>
</ol>
Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-87141988280593878292012-10-23T15:33:00.000+01:002013-11-21T09:26:17.629+00:00My struggles with closed source CMSI produce a content migration tool for the Plone CMS, called <a href="http://pypi.python.org/pypi/ilrt.contentmigrator">ilrt.contentmigrator</a>. It wraps up zope's generic setup as an easily configurable tool to export content from zope's bespoke object database, the ZODB, to a standard folder hierarchy of content with binary files and associated metadata in XML.<br />
<br />
Some time ago I added a converter to push content to Google Sites, and I have recently been tasked with pushing it to a commercial CMS. Unfortunately rather than a few days work as before this has turned into a marathon task, which I am still unsure as to whether it is achievable, due to political and commercial constraints.<br />
<br />
So I thought I should at least document my progress, or lack of, as a lesson for other naive open source habituated developers, to consider their estimates carefully when dealing with a small closed source code base, of which they have no experience.<br />
<br />
<h3>
Plan A - Use the API</h3>
<br />
So the first approach I assumed would be the simplest was to directly code a solution using "the API".<br />
<br />
API is in quotes here, since in common with many small commercial software suppliers, the name API was in fact referring to an automated JavaDoc dump of all their code, there was no API abstraction layer, or external RESTful / SOAP API, to call. Its basically the equivalent of read the source for open source projects - but with the huge disadvantage of only legally having access to read the bare - largely uncommented - class and method names, not look at the source to see how they worked - or why they didn't.<br />
<br />
Also no other customers had previously attempted to write code against the content creation part of the code base.<br />
<br />
Anyhow back to the project, content import code was written and run, but nothing changed via the web front end.<br />
<br />
It turns out that without a cache refresh the Admin interface does not display anything done via the API, hence it is essential to be able to determine if changes have occurred.<br />
<br />
Similarly if content is not cleared from the waste-basket then it cannot be recreated in the same location, along the lines of a test import scenario.<br />
<br />
Having written the code to set up the cache and other API managers and clear it. I discover that<br />
cache refresh doesn't work via the API, neither does clearing the waste basket.<br />
<br />
The only suggested solution was turn the CMS off and on again.<br />
<br />
<h3>
Plan B - Use the API and a Robot</h3>
<br />
Rather than resort to such a primitive approach, I decided to develop a Selenium, web driver based robot client. This could log into the CMS and run all the sequences of screen clicks that it takes to clear the waste-basket and cache after an API delete has been called.<br />
<br />
Eventually all this was in place, now content could be created via the API, and media loaded via the robot (since again anything that may use local file system caches or file storage, is inoperable via the API).<br />
<br />
The next stage was to create the folder hierarchy and populate it with content.<br />
<br />
Unfortunately at this point a difficult to trace API bug reared its head. If a subfolder is created in a folder via the API, then it gets created in a corrupted manner, and block subsequent attempts to access content in that folder, because the subsection incorrectly registers itself as content - but is then found to be missing. After much time spent tracing this bug, the realisation dawned that, it would therefore not be viable to create anything but a subset of content objects via the API, and everything else would need the robot being mixed in to work.<br />
<br />
This seemed like a much less maintainable solution. Especially since most pages of the CMS had 50 or more javascript files linked to them, so only a current browser WebDriver client robot would function at all with it. Even then, often the only way to get the robot clicks and submits to work was to grab the javascript calls out from the source and call the jQuery functions directly with the WebDriver javascript engine.<br />
<br />
<h3>
Plan C - Use the import tool and a Robot</h3>
<br />
So having wasted 3 days tracing that a bug was in the (closed source) API, it was time to take a step back and think about whether there was realistically a means by which an import tool could be created, by a developer outside of the company supplying the CMS, ie. me.<br />
<br />
Fortunately the CMS already had an XML export / import tool. So all we need to do is convert our XML format to the one used by the company, and the rest of the code was their responsibility to maintain.<br />
<br />
At first their salesman seemed fine with this solution. So I went away and started on that route. Having left the existing code at the point where the sub-folder creation API bug, blocks it working.<br />
<br />
However on trying out the CMS tool, it also failed to work in a number of ways. The problems that it currently has are listed below, and my focus is presently writing a selenium based test suite, that will perform a simple folder, content and media export and import with it.<br />
<br />
Once the tool passes, then we have confirmation that the API works (at least within the limited confines of its use within the tool). We can then write a converter for the XML format and driver for the tool / or even revisit the API + robot route, if its fixed.<br />
<br />
Below are the issues, that need to work, and that the test suite is designed to confirm are functional ...<br />
<h3>
</h3>
<h3>
Content Exporter Issues (status in brackets)</h3>
<ol>
<li>The folder hierarchy has to be exported separately from the content. If you select both at once - it throws an error (<b>minor</b> - run separately)</li>
<li>The hierarchy export appends its data when exported to the same folder, so creating an invalid hierarchy.xml after the first run (<b>minor</b> - could delete on the file system in between) </li>
<li>Hierarchy export doesn't work. It just creates XML with the top level folder title wrapped in tags containing the default configuration, attributes - but no hierarchy of folders. (<b>blocker</b> - need the hierarchy especially to work, since the sub-folder creation was the blocking bug issue with using the API directly)</li>
<li>Content export only does one part of one page at a time, ie. a single content item (<b>minor</b> - this means that it is not a very useful export tool for humans - however via a robot - it could be run hundreds of times to get a folder done)</li>
<li>The embedded media export doesn't work, no media is created (<b>blocker</b> - we need to be able to do images and files)</li>
<li>Content import - A single content item works - and if the media already exists with the right id, that works. Cant judge about media import - since media export fails so have not got a format to follow. (<b>blocker</b> - we need media to work as well as a minimum. Ideally we could import all the parts of a page in one go - or even more than one page, at once!)</li>
<li>Hierarchy import - Creating a single section works. Cannot judge for subsections - since the export doesn't work. (<b>pass?</b>)</li>
<li>Configuration changes can break the tool (<b>blocker</b> - the whole point of the project is to provide a working tool for a phased transition of content, it should work for a period of at least two years)</li>
<li>Not sure if the tool can cope with anything but default T4 metadata (<b>minor</b> - A pain but the metadata changes to existing content are part of the API that should function OK directly, so could be done separately to the tools import of content.)</li>
</ol>
Once we have a consistently passing CMS tool, we can assess the best next steps.<br />
<br />
The testing tool, has proved quite complex to create too, because of the javascript issues mentioned above, but this now successfully tests the run of an export of a section and a piece of content, checking the exported XML file, also running the import for these to confirm the functionality is currently at the level listed above.<br />
<br />
Having been burnt by my experience so far, my intention is to convert the Plone export XML and files to the new CMS native XML format - push it to the live server and run the robot web browser to trigger its import, so that eventually we will have a viable migration route - as long as the suppliers ensure that their tool (and underlying API) are in working order.<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-11942490926901148912012-07-09T14:56:00.000+01:002012-07-09T22:50:04.642+01:00Review of talks I attended or was recommended at Europython 2012This is a quick jotting down of recommendations with video links related to this years Europython.<br />
It is really intended for other developers in my workplace.<br />
But in case it has wider relevance I have posted it to my blog. Appologies for the rough format - and remember that with 5 tracks and 2 trainings, I only got exposed to a small portion of the talks.<br />
<br />
<a href="http://www.youtube.com/user/PythonItalia/videos">YouTube channel of all talks</a><br />
<br />
<h3>
Inspiring / Interesting talks</h3>
<br />
<span style="background-color: white;"><a href="http://www.youtube.com/watch?v=9gbUFyPltDs&feature=plcp">Permission or forgiveness</a></span><br />
Linking women programmers, and the approach of Grace Hopper, inventor of the compiler to the wider implications of her approach. To enable creativity in an organisation, the rules, that ensure its survival, must be broken. Since middle management's inevitable behaviour will default to blocking permission for innovations, just ignore them wrt. anything technical, for the greater good!<br />
<br />
<a href="http://www.youtube.com/watch?v=OTHggyZAot0&feature=plcp">Music Theory - Genetic algorithms and Python</a><br />
Fun and enthusiastic use of Python to rival the masters of classical music!<br />
<br />
<a href="http://www.youtube.com/watch?v=RwbEEzl3bL4&feature=plcp" style="background-color: white;">State of python</a><br />
<span style="background-color: white;">So general view of dyamic langs being on the up ruby, js etc.</span><br />
Seems that static typing snobbery is Guido's bugbear.<br />
<span style="background-color: white;">Increase in popularity shown by 800 at ep12, compared to 500 at ep10</span><br />
<span style="background-color: white;">Then bunch of internal python language decisions stuff, and dealing with trolls</span><br />
<span style="background-color: white;"><br /></span><br />
<a href="http://www.youtube.com/watch?v=lmuhyc4aPYs&feature=plcp">Stop censorship with Python</a><br />
Tor project used to allow uncensored internet in China, etc.<br />
<br />
<a href="http://www.youtube.com/watch?v=BaqaIw2c91o&feature=plcp">The larch environment</a><br />
Every wanted to write code with pictures rather than boring old text?<br />
Pretty amazing what this PhD student has put together.<br />
<br />
<a href="http://www.youtube.com/watch?v=EwLih26Cjfs&feature=plcp">Aspect orientated programming</a><br />
Possibly inspire you to stretch a paradigm as far as it will go (even to breaking point?).<br />
<br />
<h3>
SW Design and APIs</h3>
<div>
<a href="http://www.youtube.com/watch?v=bv89IOFvn7o&feature=plcp">Best to stick with last year's Python API design talk</a><br />
<br /></div>
<h3>
Scaling or deployment (for Django)</h3>
<h4 style="font-weight: normal;">
<b>Django under massive loads</b><br />Good coverage of scaling django especially wrt. running on Postgresql. Coverage of classic issues wrt. performance and the django ORM. So for example using slice with a queryset always loads the whole queryset into memory. </h4>
<h4>
<a href="http://www.youtube.com/watch?v=gmIG56Hf9dc&feature=plcp" style="font-weight: normal;"> How to bootstrap a startup with Django</a><br /><span style="font-weight: normal;">Coverage of the standard add on components for a medium scale django deployment</span></h4>
<h4>
<span style="font-weight: normal;"> </span></h4>
<div>
<div id="watch-headline-title">
<a href="http://www.youtube.com/watch?v=yxALwwDyWoA&feature=plcp">Bitbucket - <span class="" dir="ltr" id="eow-title" title="Healty webapps throught continous introspection">Healty webapps throught continous introspection </span></a></div>
</div>
<div>
Have released geordi, django-dogslow and interruptingcow to handle issues.</div>
<div>
<ul>
<li><span style="background-color: white;"><a href="https://bitbucket.org/brodie/geordi">geordi</a> provides a URL based means of getting full PDF profiling reporting back for pages</span></li>
<li><span style="background-color: white;"><a href="http://pypi.python.org/pypi/dogslow/">dogslow </a>does monitoring and email reporting of hotspots with traceback from live systems.</span></li>
<li><span style="background-color: white;"><a href="http://pypi.python.org/pypi/interruptingcow/">interruptingcow </a>allows setting of nested timeout levels for doing expensive over lighter operations for the web server</span></li>
</ul>
</div>
<div>
<br /></div>
<a href="http://www.youtube.com/watch?v=G2MfIP7GT4M&feature=plcp" style="background-color: white;">Spotify question session</a><br />
Useful insights into scaling - particularly for a large scale Python applications using Cassandra.<br />
<br />
Need to be careful with compaction routine.<br />
So half load capacity due to it making spikes. So it has to sometimes jump if overloaded<br />
dont see error - to fix this last percentage is very hard. Instead go <span style="background-color: white;">for pretend its working approach. Just retry to catch failiures. </span><span style="background-color: white;">Cassandra - dont upgrade .8 to 1.0 </span><br />
NB: Employed oracle guys who worked on JVM to fix some of cassandras issues - well JVM/Cassandra bugs <span style="background-color: white;">it revealed at load!</span><br />
<div>
<br />
<a href="http://www.youtube.com/watch?v=ENnI2FU3EV4&feature=plcp">What I learned from big web app deployments</a> - how to scale application deployments (zope particularly)</div>
<br />
<a href="http://www.dalkescientific.com/writings/diary/archive/2012/01/19/concurrent.futures.html">concurrent.futures</a><br />
Concurrent programming made easy. Example was bulk processing of a big Apache log. Ditch the old separate multithread or multiprocessor libraries for this python 3 (and backport 2) package.<br />
<br />
<h3>
Language approaches</h3>
<a href="http://www.youtube.com/watch?v=D3-NZXHO5QI&feature=plcp">Using descriptors</a> - useful to know some of the use cases for these language features<br />
<a href="http://www.youtube.com/watch?v=cMtBUvORCfU&feature=plcp">PyPy JIT</a> - a look under the hood now I know that RPython is not R + Python, but restricted Python.<br />
<a href="http://www.youtube.com/watch?v=x6OL88pzjHQ&feature=plcp">PyPy keynote</a> - coverage of current activity in pypy development<br />
<h3>
Big Data / Mining </h3>
<div>
<a href="http://www.youtube.com/watch?v=_JrZUm9ZHIw&feature=plcp">pandas and pytables</a> - amazing simple API to mine data (in this case financial)</div>
<div>
<br /></div>
<div>
<h3>
<b>Testing</b></h3>
</div>
<div>
<div>
<a href="http://www.youtube.com/watch?v=garsUmsZIac&feature=plcp">Lessons in Testing - Disqus</a></div>
<div>
Useful insights into testing</div>
<div>
Set up included jenkins, nose etc. Run tests concurrently to speed up test suite.</div>
<div>
Note that selenium was painful for them - <span style="background-color: white;">Far too brittle!</span></div>
</div>
<div>
<br /></div>
<div>
<a href="http://www.youtube.com/watch?v=WfyoC0h9QKA&feature=plcp">TDD with selenium</a></div>
<div>
The presenter said this may be a little basic level for me - and a bit crowded so I went to other stuff in the end.<br />
<br />
<h3>
Other talks I attended </h3>
<ol>
<li><b><span style="font-weight: normal;">Let your brain talk to computers</span></b></li>
<li>Ask your BDFL</li>
<li>Becoming a better programmer</li>
<li>NDB: The new App Engine datastore library</li>
<li>Advanced Flask patterns (cancelled)</li>
<li>Big A little I (AI patterns in Python)</li>
<li>Increasing women's engagement in Python</li>
<li>Minimalism in Software development</li>
<li>The integrators guide to duct taping</li>
<li>Guidelines to writing a Python API</li>
<li>Composite key is ready for django 1.4</li>
<li>Heavybase: A clinical peer-topeer database</li>
<li>Beyond clouds: Open source edge computing with SlapOS</li>
<li><a href="http://www.youtube.com/watch?v=OeToCdcv8zo&feature=plcp">Creating federated authorisation for a Django survey system</a> (my talk!)</li>
</ol>
</div>
<div>
<br /></div>
<div>
<a name='more'></a></div>
<h4>
Lightning talks</h4>
<div>
<span style="background-color: white;"><b>Monday</b></span></div>
<div>
<div>
<br /></div>
<div>
1 Transifex django translate .po file generator</div>
<div>
<br /></div>
<div>
2 spotify help page</div>
<div>
<br /></div>
<div>
3 windows conversion</div>
<div>
<br /></div>
<div>
4 egenix pyrun</div>
<div>
<br /></div>
<div>
5 django lukaz comic django re-engineering - 'I regret nothing' - story</div>
<div>
<br /></div>
<div>
6 the scraper wiki guy - julian</div>
<div>
<br /></div>
<div>
7 sqlalchemy tim - add audit history</div>
<div>
<br /></div>
<div>
8 PSF - all can go! Napoli</div>
<div>
<br /></div>
<div>
9 recipe share - flask based openstate.eu</div>
<div>
<br /></div>
<div>
10 fund raiser</div>
<div>
<br /></div>
<div>
11 rededdy online student python math runner</div>
<div>
<br /></div>
<div>
12 kivy.org - ui interface toolkit - mobile</div>
<div>
<br /></div>
<div>
13 python brochure for marketing</div>
<div>
<br /></div>
<div>
14 pyscopg and pypy - help out with moving to pure python ctypes implementation</div>
<div>
<br /></div>
<div>
15 micro framework</div>
</div>
<div>
<br /></div>
<div>
<b>Wednesday</b></div>
<div>
<br /></div>
<div>
<div>
1 PCRE - perl like regex instead of re</div>
<div>
<br /></div>
<div>
2 october pycon s africa</div>
<div>
<br /></div>
<div>
3 docbook to sphinx</div>
<div>
<br /></div>
<div>
4 will hardy - nlp and geocoding</div>
<div>
<br /></div>
<div>
5 moin moin - whoosh search</div>
<div>
<br /></div>
<div>
6 python anywhere</div>
<div>
<br /></div>
<div>
7 pycon uk Sept Coventry</div>
<div>
<br /></div>
<div>
8 controlling telescopes South Africa</div>
<div>
<br /></div>
<div>
9 music analysis with nlp and ai</div>
<div>
<br /></div>
<div>
10 massage man</div>
<div>
<br /></div>
<div>
11 jbart rest based mobile mashup toolkit</div>
<div>
<br /></div>
<div>
12 social fitness game - fitocracy.com</div>
<div>
<br /></div>
<div>
13 pyramid antipatterns & pycon japan</div>
<div>
<br /></div>
<div>
14 django is too simple - new made from bits sqlalcemy jinja</div>
</div>
<div>
<br /></div>
<div>
<b>Friday</b></div>
<div>
<br /></div>
<div>
Didn't record them I am afraid.<br />
<br />
---------------<br />
<br />
The best lightning talk was the 'I regret nothing' one, and the most impressive was the South African guy controlling the largest radio telescope array in the world live - via python desktop app - showing via videocam of the array moving around.</div>
<div>
NB: Also Higgs boson got some coverage during the conference since the data sensing input and big data parsing software is largely Python.</div>
<div>
<br /></div>
<h4>
Books from the O'Reily stall, I noticed may be an entertaining read</h4>
<div>
<div>
Confessions of public speaker - scott berkun</div>
<div>
7 langs 7 weeks<br />
Maybe time to download and read <a href="https://bitbucket.org/BruceEckel/python-3-patterns-idioms/" style="background-color: white;">https://bitbucket.org/BruceEckel/python-3-patterns-idioms/</a></div>
</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<br />
<br />
<br />Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-73982909515239068232012-07-07T22:07:00.002+01:002012-07-08T01:47:30.019+01:00Europython 2012 - hot stuffBeing English I often tend to start a new conversation with the weather. At <a href="https://plus.google.com/u/0/photos/101147159010203967702/albums/5761050058536817313">Europython this week</a> I had good reason, it was hot 30 - 35 degrees centrigrade. Whilst at home the UK has been bathed with ... rain and temperatures of 20 at the most. Of course for Florentines it is only a couple of degrees above the norm, so nothing worth talking about. However they were polite enough to respond to this, or any other opening conversational gambit I offered, and in general I found Europython to be a very social event this year, in terms of meeting new people, probably more so than any previous conference I have been to for Python or its frameworks.<br />
<br />
At this year's conference I attended on my own, and hence I made a bit more of an effort to be sociable. This along with luckily getting a poster session (that could help justify work sending me!), were prompts to try and start conversations where I may normally have been more reticent.<br />
<br />
The conference itself has a great atmosphere for mixing in any case. With possibly four main themes. Core language and implementation developers. Web applications. Data mining and processing. Configuration management and automation tools. Of course within these there are divisions, the investment banking iPython analysers vs. the applied science academic researchers. Or Pyramid vs. Django, etc., but it seems everyone can usefully share ideas, whether they are sales engineers from a large company or a pure hobbyist.<br />
<br />
This inclusiveness was also a theme in itself, particularly wrt. women. Kicking off with Alex Martelli's keynote about the inventor of the compiler, along with a lot of other stuff, <a href="http://web.mit.edu/invent/iow/hopper.html">Grace Hopper</a>. <br />
Unfortunately they are under represented in the coding sector, at work I think its around 20% for programmers, but even that is higher than the average - probably because we are public sector / unionised. This is reflected by much lower membership of our local DBBUG Django group who are mainly drawn from the commercial sector, with only 2 out of around 50 active members. Europython was as bad, at 4% last year, but that has doubled this year to around 60 of the 750 attendees.<br />
<br />
Returning to Python themes. The chance to chat to the data miners was most useful since we are currently in a state of transition at work. Having been involved in internal systems, particularly CMS from the days when it was evolving 10 years ago, we are now moving to a more pure R&D role.<br />
This means that CMS work is to be dropped and whilst we want to continue large custom web application work related to research (thats where my poster session on our Django survey system comes in). We also want to be moving towards work that ties up with the University's applied science research - especially big data mining and the like.<br />
So for me the chance to talk (and listen) to people across a range of disciplines was ideal.<br />
<br />
Lastly, I also realised how stale my knowledge of the the new features of the language are. Time to get a book on Python 3 - and get back on track I think. Oh and of course many thanks to the Italian Python community and conference organisers for a really great conference - and more than my fare share of free cocktails - which certainly helped break the ice.<br />
<br />
<br />
<br />Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-40915226230932404762012-06-16T10:30:00.003+01:002012-06-16T18:10:57.180+01:00Talks vary, especially mineThis week I had the opportunity to go to a couple of gatherings and deliver two python related talks. The first was at our local <a href="https://groups.google.com/forum/?fromgroups#!forum/dbug">Django Bath and Bristol Users Group</a> (#DBBUG). The second was at the
<a href="http://geug12.port.ac.uk/">Google Apps for EDU European User Group</a> (GEUG12).<br />
<br />
I don't do a talk that often, maybe 5 or 6 times a year, and I guess share some of the traits of the stereotyped geek that I am not a natural extrovert who wants to be up on stage. But at the same time, there is a bit of an adrenalin rush and subsequent good feeling (maybe just relief its over!) if a talk seems to go OK.<br />
<br />
The first talk went well, although perhaps that was just from my perspective, having consumed a generous quantity of the free beer before hand, provided by the hosters, <a href="http://p.ota.to/">Potato</a> . It was good to see a lot of new faces at the meeting, and hear about interesting projects from other speakers.<br />
My talk was hardly rocket science, instead just a little session on the <a href="https://docs.google.com/present/view?id=0AaTNd6XgEH_LZHg4ejk2cV81MWhqbXo5aGZu&hl=en_GB">basics of python packaging</a> and why its a good thing to do it. But it seemed to me it was pitched at about the right technical level for the newbies to get something from it, and the more experienced to contribute comments. It was paced about right and I pretty much followed the thread of the slides ie. from 'what is an egg' to C.I. and local PyPi, without slavishly reading out from them. The atmosphere was relaxed and people seemed interested. All in all a good session.<br />
<br />
The second talk unfortunately did not go well, even though I spent more time preparing it - it even triggered me into upgrading my <a href="http://edcrewe.com/">personal App Engine</a> site (see previous blog posting) - which as described took a good few hours in itself - to make sure my App Engine knowledge was up to date. So what went wrong. Well maybe with the previous talk going well with little preparation and no rehearsal - I had started to get a bit blase. Hey look at me, I can fire off a talk no problem, don't worry about it. However to some extent the quality of any talk depends as much on the audience as the speaker - its an interactive process - and for a few reasons, I didn't feel that I had established that communication - and it threw me, I guess, to the point where I was really stumbling along at the end.<br />
<br />
So I thought I would try and analyse some of the reasons, to try to avoid it happening again. These are all common sense and probably in any guide to public speaking - but I thought its worth writing it down - even if its only for my benefit!<br />
<br />
The core reason was that I assumed that there was a disconnect between the audience, and what I was talking about. So I wasn't preaching the gospel to the humanist society - or zope at a django conference ;-) I was talking about how you can use <a href="https://docs.google.com/presentation/d/1iWKk8_BRTFP6pPIbL8BM0knRIP0zy6qwsPlaBn1bEyo/present#slide=id.g2526149c_1_14">App Engine as a CMS</a>, to a group of University educational technology staff - managers, learning support and some developers.<br />
<br />
So first mistake was that I had adapted a talk that I had delivered to a group of developers a year before. It had gone well before because the audience, like me, were not interested in using the tools - they were interested in building the tools, how these tools could be used to build others - and what implications that had for how we should approach building tools in the future.<br />
<br />
Lesson 1: Try to get an idea of your audience's background - then write the talk tailored for them from scratch (even if its a previous talk - unless you know the audience profile hasn't changed). Also if a previous demo and talk with questions had been an hour - and now it has to be done in 20 minutes - rewrite it - or at least delete half the slides - but don't expect to talk three times faster!<br />
<br />
Lesson 2: If you do feel that you might of pitched a talk at the wrong technical level - and there is no time to rewrite or rethink it, its probably best to just deliver it as it stands. Moderating all the slides with 'sorry too techie' and rephrasing things in layman's terms on the fly - is probably going to be less coherent, and lose the thread of the talk anyhow - unless you are a well experienced teacher.<br />
<br />
My first slide was entitled APIs in transition - hmmm that was a mistake, a few people immediately left the room.<br />
<br />
Lesson 3: The most interesting initial thing to me coming back to my site, were all the changes that had occurred with the platform. However if you haven't used it before that is irrelevant. So remember don't focus on what you last found interesting about the topic - focus on the general picture for somebody new to it.<br />
<br />
Lesson 4: Don't start a talk with the backend technology issues. Start it with an overview of the purpose of the technology and ideally a demo or slide of it in use. However backend your topic, its always best to start with an idea of it from the end user perspective - even when talking to a room full of developers.<br />
<br />
When I got to the demo part I skipped it due to feeling time pressure - actually however this would of been best acting as the main initial element of the talk, with all the technical slides skipped and just referred to for those interested. Finishing with the wider implications regarding sites based around mash-ups, driven by a common integration framework. So ditching most of the technical details.<br />
<br />
Lesson 5: Don't be scared to reorganise things at the last minute, before starting (see Lesson 2) - if that reorganisation is viable, eg. in terms of pruning and sequence.<br />
<br />
<br />
There was a minor organisational issue in that I started 5 minutes before I was due to end a 20 minute talk, with no clock to keep track. So there was a feeling of having over run almost from the start, combine that with people leaving or even worse people staying but staring at you with a blank, 'You are really boring' look!<br />
<br />
Lesson 6: Check what time you are really expected to end before you start and get your pace right based on this. Keep looking around the audience and try to just find a least some people who look vaguely interested! - ignore the rest - it is unlikely you can ever carry the whole audience's interest - unless you are a speaker god - but you need to feel you have established some communication with at least some members of it - to keep your thread going. <br />
<br />
<br />
OK well I could go on about a number of other failings ... but hey, I have spent longer writing about it than I did delivering it. So that will do, improve by learning from my mistakes, and move on.<br />
<br />
<h3>
User groups vary too</h3>
As a footnote another difference was the nature of the two user groups.<br />
<br />
The DBBUG group was established and is entirely organised by its members, like minded open source developers, who take turns organising the meetings, etc. Its really just an excuse for techies to go to the pub together on a regular basis - and not necessarily always chat about techie stuff. Its open to anyone and completely informal.<br />
<br />
GEUG is also largely organised by its members taking turns, but was originally established by Google's HE division for Europe and has a lot of input from their team, it requires attendees to be members of customer institutions. So essentially its a customer group and has much more of that feel. Members attend as a part of their job. Google's purpose is to use it to expand its uptake in HE - by generating a self supporting community that promotes the use of its products, and trials innovative use cases. To some extent feeding back into product development. With a keynote by Google and all other talks by the community members. Lots of coloured cubes, pens, sports jackets - and a perhaps, slightly rehearsed, informality. But interestingly quite open to talks that perhaps didn't praise their products or demonstrated questionable use cases, regarding the usual bugbear of data protection. Something that is a real sore spot within Europe apparently - the main blocker to cloud adoption in HE.<br />
<br />
Having once attended a Microsoft user group event in Dublin at the end of the 90s, I would say that this was a long way removed from that. The Microsoft event was strictly controlled, no community speakers, nothing but full sales engineer style talks about 'faultless' products, there was no discussion of flaws or even of approaches that could generate technical criticisms. Everybody wore suits - maybe that is just the way software sales were way back when Microsoft dominated the browser and desktop.<br />
<br />
Where as now community is where it is at. Obviously GEUG felt slightly less genuinely community after DBBUG, but I would praise Google that they are significantly less controlling over shaping a faultless technical suited business face to HE, than some of their competitors. Unfortunately for them non-technical managers with their hands on the purse strings tend to be largely persuaded by the surface froth of suits and traditional commercial software sales - disguise flaws, rather than allow discussion of whether and how they may be addressed.<br />
<br />
In essence the Google persona that is carefully crafted to sit more towards an open source one, but as a result may suffer from the same distrust that traditional non-technical clients have for open source over commercial systems. Having said that they are not doing too badly ... dominating cloud use in US HE. Maybe Europe HE is just a tougher old nut to crack.<br />
<br />
<br />
<br />
<br />
<br />
<br />Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-57161762687674456702012-06-05T15:21:00.000+01:002012-06-05T16:32:41.989+01:00Upgrading a Google App Engine Django appIn my day to day work, I haven't had an opportunity to use Google App Engine (GAE).
So to get up to speed with it, and since it offers free hosting for low usage, I <a href="http://edcrewe.blogspot.co.uk/2011_05_01_archive.html">created my home site</a> on the platform a year ago. <a href="http://www.edcrewe.com/">The site</a> uses App Engine as a base, and integrates in Google Apps for any content other than custom content types.<br />
<br />
Recently I have been upgrading the Django infrastructure we use at work, from Python 2.6 to 2.7 and Django 1.3 to 1.4. I thought having been a year, it was probably time to upgrade my home site too, having vaguely registered a few changes in App Engine being announced.
On tackling the process, I realised that 'a few changes' is an understatement.<br />
<br />
Over the last year GAE has moved from a beta service to a full Google service. Its pricing model has changed, its backend storage has changed, the python deployment environment has changed and the means of integrating the Django ORM has changed. A raft of other features have also been added. So what does an upgrade entail?<br />
<br />
<h3>
Lets start with where we were in Spring 2011</h3>
<ol>
<li>Django 1.2 (or 0.96) running on a Python 2.5 single threaded CGI environment</li>
<li>The system stores data in the Master/Slave Big Table datastore</li>
<li>Django's standard ORM django.db is replaced by the NOSQL google.appengine.ext.db *</li>
<li>To retain the majority of forms functionality ext.djangoforms replaces forms *</li>
<li><a href="http://code.google.com/p/gdata-python-client/">python-gdata</a> is used as the standard means to integrate with Google Apps via common RESTful Atom based APIs<br /><br />* as an alternative <a href="http://www.allbuttonspressed.com/projects/django-nonrel">django-norel</a> could of been used to provide full standard ORM integration - but this was overkill for my needs</li>
</ol>
<div>
To configure GAE a typical app.yaml would of been the following:</div>
<pre class="CICodeFormatter"><code class="CICodeFormatter">
application: myapp
version: 1-0
runtime: python
api_version: 1
handlers:
- url: /remote_api
script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
login: admin
- url: /.*
script: main.py
- url: /media
static_dir: _generated_media
secure: optional
</code></pre>
<div>
With the main CGI script to run it</div>
<pre class="CICodeFormatter"><code class="CICodeFormatter">
import os
from google.appengine.ext.webapp import util
from google.appengine.dist import use_library
use_library('django', '1.2')
os.environ['DJANGO_SETTINGS_MODULE'] = 'edcrewe.settings'
import django.core.handlers.wsgi
from django.conf import settings
# Force Django to reload its settings.
settings._target = None
def main():
# Create a Django application for WSGI.
application = django.core.handlers.wsgi.WSGIHandler()
# Run the WSGI CGI handler with that application.
util.run_wsgi_app(application)
if __name__ == '__main__':
main()
</code></pre>
<div>
<br /></div>
<h3>
What has changed over the last year</h3>
Now we have multi-threaded WSGI Python 2.7 with a <a href="https://developers.google.com/appengine/docs/python/python27/newin27">number of other changes</a>.<br />
<br />
<ol>
<li>Django 1.3 (or 1.2) running on a Python 2.7 multi threaded WSGI environment</li>
<li>The system stores data in the HRD Big Table datastore</li>
<li>For Big Table the NOSQL google.appengine.ext.db is still available, but Django's standard ORM django.db is soon to be available for hosted MySQL</li>
<li>google.appengine.ext.djangoforms is <a href="http://code.google.com/p/appengine-admin/issues/detail?id=30">not available any more</a><br />Recommendation is either to stop using ModelForms and hand crank data writing from plain Forms - or use django-norel - but it does have a startup overhead *</li>
<li><a href="http://code.google.com/p/gdata-python-client/">python-gdata</a> is still used but it is being replaced by simpler JSON APIs specific to the App in question, managed by the <a href="https://code.google.com/apis/console">APIs console</a> and accessible via <a href="http://code.google.com/p/google-api-python-client/">google-api-python</a>.<br /><br />* <a href="http://www.allbuttonspressed.com/projects/django-nonrel">django-norel</a> support has moved from its previous authors - with the Django 1.4 rewrite still a <a href="https://github.com/django-nonrel/django-1.4">work in progress</a></li>
</ol>
<div>
Hmmm ... thats a lot of changes - hopefully now we are out of beta - there won't be so many in another year's time! So how do we go about migrating our old GAE Django app.<br />
<br /></div>
<h3>
Migration</h3>
<div>
Firstly the Python 2.7 WSGI environment requires a different app.yaml and main.py </div>
<div>
Now to configure GAE a typical app.yaml would be:</div>
<pre class="CICodeFormatter"><code class="CICodeFormatter">application: myapp-hrd
version: 2-0
runtime: python27
api_version: 1
threadsafe: true
libraries:
- name: PIL
version: latest
- name: django
version: "1.3"
builtins:
- django_wsgi: on
- remote_api: on
handlers:
# Must use threadsafe: false to use remote_api handler script?
#- url: /remote_api
# script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
# login: admin
- url: /.*
script: main.app
- url: /media
static_dir: _generated_media
secure: optional
</code></pre>
<div>
With the main script to run it just needing...</div>
<pre class="CICodeFormatter"><code class="CICodeFormatter">import os
import django.core.handlers.wsgi
os.environ['DJANGO_SETTINGS_MODULE'] = 'edcrewe.settings'
app = django.core.handlers.wsgi.WSGIHandler()
</code></pre>
<div>
But why is the app-id now myapp-hrd rather than myapp?<br />
In order to use Python 2.7 you have to move to the HRD data store. To migrate the application from the deprecated Master/Slave data store it must be replaced with a new application. New applications now always uses HRD.<br />
Go to the admin console and 'Application Settings' and at the bottom are the migration tools. These wrap up the creation of a new myapp-hrd which you have to upload / update the code for in the usual manner. Once you have fixed your code to work in the environment (see below) - upload it. <br />
The migration tool's main component is for pushing data from the old to the new datastore and locking writes to manage roll over.
So assuming all that goes smoothly you now have a new myapp-hrd with data in ready to go, which you can point your domain at.<br />
<br />
NB: Or you can just use the remote_api to load data - so for example to download the original data to your local machine for loading into your dev_server:</div>
<pre class="CICodeFormatter"><code class="CICodeFormatter">${GAE_ROOT}/appcfg.py download_data --application=myapp
--url=http://myapp.appspot.com/remote_api --filename=proddata.sql3
${GAE_ROOT}/appcfg.py upload_data --filename=proddata.sql3
${MYAPP_ROOT}/myapp --url=http://localhost:8080/remote_api
--email=foo@bar --passin --application=dev~myapp-hrd
</code></pre>
<div>
<h3>
Fixing your code for GAE python27 WSGI</h3>
Things are not quite as straight forward as you may think from using the dev server to test your application prior to upload. The dev server's CGI environment no longer replicates the deployed WSGI environment quite so well - like the differences between using Django's dev server and running it via Apache mod_wsgi. For one thing any CGI script imports OK as before for the dev server - yet may not work on upload or require config adjustments - e.g. ext.djangoforms is not there, and use of any of the existing utility scripts - such as the remote_api script for data loading, requires disabling of the multi-threaded performance benefits. Probably the workaround here for more production scale sites than mine, is to have a separate app for utility usage than the one that runs the sites.<br />
<br />
If you used ext.djangoforms, either you have to move to django-norel or do code writes directly. For my simple use case I wrote a simple pair of utility functions to do data writes for me, and switched my ModelForms to plain Forms.<br />
<br />
<pre class="CICodeFormatter"><code class="CICodeFormatter">
<span class="blue">def get_ext_db_dicts(instance):</span>
<span class="green">""" Given an appengine ext.db instance return dictionaries
for its values and types - to use with django forms
"""</span>
value_dict = {}
type_dict = {}
for key, field in instance.fields().items():
try:
value_dict[key] = getattr(instance, key, '')
type_dict[key] = field.data_type
except:
pass
return value_dict, type_dict
<span class="blue">def write_attributes(request, instance):</span>
<span class="green">""" Quick fix replacement of ModelForm set attributes
TODO: add more and better type conversions
"""</span>
value_dict, type_dict = get_ext_db_dicts(instance)
for field, ftype in type_dict.items():
if request.POST.has_key(field):
if ftype == type(''):
value = str(request.POST.get(field, ''))
elif ftype == type(1000):
try:
value = int(request.POST.get(field, 0))
except:
value = 0
elif ftype == type([]):
value = request.POST.getlist(field)
else:
value = str(request.POST.get(field, ''))
setattr(instance, field, value)
<span class="orange">Crude but it allows one line form population for editing ...</span>
mycontentform = MyContentForm(value_dict)
<span class="orange">... and instance population for saving ...</span>
write_attributes(request, instance)
</code></pre>
<div>
However even after these fixes and data import, I still had another task. Images uploaded as content fields were not transferred - so these had to be manually redone. This is maybe my fault for not using the blobstore for them - ie since they were small images they are were just saved to the Master/slave data store - but pretty annoying even so.<br />
<br /></div>
<h3>
Apps APIs</h3>
</div>
<div>
Finally there is the issue of gdata APIs being in a state of flux. Well currently the new APIs don't provide sufficient functionality and so given that this API move by Google still seems to be in progress - and how many changes the App Engine migration required - I think I will leave things be for the moment and stick with gdata-python ... maybe in a years time!</div>Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-56877090550660452672012-03-27T20:46:00.029+01:002012-04-19T09:19:35.577+01:00Django - generic class views, decorator classI am currently writing a permissions system for a Django based <a href="http://www.survey.bris.ac.uk/">survey application</a> and we wanted a nice clean implementation for testing users for appropriate permissions on the objects displayed on a page.<br />
<br />
Django has added class views in addition to the older function based ones. Traditionally tasks such as testing authorisation has been applied via decorator functions.<br />
<br />
<pre class="CICodeFormatter" ><code class="CICodeFormatter"> <span class="orange">@login_required</span>
<span class="blue">def my_view</span>:
return response
</code></pre><br />
The recommended approach to do this for a class view is to apply a method decorator.<br />
There is a utility convertor method_decorator, to do this for function decorators.<br />
<br />
<pre class="CICodeFormatter" ><code> class ProtectedView(TemplateView):
<span class="orange">@method_decorator(login_required)</span>
<span class="blue">def dispatch</span>(self, *args, **kwargs):
return super(ProtectedView, self).dispatch(*args, **kwargs)
</code></pre><br />
However this isn't ideal since it is less easy to check all the methods for decorators than just look at the top of the view function as before.<br />
<br />
So why not use a class decorator instead to make things more clear. Fine, except we do actually want to decorate the dispatch method. But we can add a utility decorator that wraps this up.*<br />
<br />
<pre class="CICodeFormatter" ><code class="CICodeFormatter"><span class="orange">@class_decorator(login_required, 'dispatch')</span>
<span class="blue">class ProtectedView</span>(TemplateView):
def dispatch(self, *args, **kwargs):
return super(ProtectedView, self).dispatch(*args, **kwargs)
</code></pre><br />
But Django's generic class views contain more than just the TemplateView, they have generic list, detail and update views. All of which use a standard pattern to associate object(s) in the context data. Not only that but the request will also have user data populated if the view requires a login. <br />
<br />
What I want to do is have a simple decorator that just takes a list of permissions, then ensures users who access the class view must login and then have each of these object permissions checked for the context data object(s). So my decorator for authorising user object permissions will be @class_permissions('view', 'edit', 'delete')<br />
<br />
To do this the class_permissions decorator itself, is best written as a class. The class can then combine the actions of three method decorators on the two Django generic class view methods - dispatch and get_context_data. <br />
Firstly login_required wraps dispatch - then dispatch_setuser wraps this to set the user that login_required delivers as an attribute of the class_permissions class.<br />
These must decorate in the correct order to work.<br />
<br />
Finally class_permissions wraps get_context_data to grab the view object(s). The user, permissions and objects can now all be used to test for object level authorisation - before a user is allowed access to the view. The core bits of the final code are below - my class decorator class is done :-)<br />
<br />
<pre class="CICodeFormatter" ><code class="CICodeFormatter">
<span class="blue">class class_permissions</span>(object):
<span class="green">""" Tests the objects associated with class views
against permissions list. """</span>
perms = []
user = None
view = None
<span class="blue">def __init__</span>(self, *args):
self.perms = args
<span class="blue">def __call__</span>(self, View):
<span class="green">""" Main decorator method """</span>
self.view = View
<span class="blue">def _wrap</span>(request=None, *args, **kwargs):
<span class="green">""" double decorates dispatch
decorates get_context_data
passing itself which has the required data
"""</span>
setter = getattr(View, 'dispatch', None)
if setter:
decorated = method_decorator(
dispatch_setuser(self))(setter)
setattr(View, setter.__name__,
method_decorator(login_required)(decorated))
getter = getattr(View, 'get_context_data', None)
if getter:
setattr(View, getter.__name__,
method_decorator(
decklass_permissions(self))(getter))
return View
return _wrap()
</pre><br />
The function decorators and imports that are used by the decorator class above<br />
<br />
<pre class="CICodeFormatter" ><code class="CICodeFormatter">from functools import wraps
from django.utils.decorators import method_decorator
<span class="blue">def decklass_permissions</span>(decklass):
<span class="green">""" The core decorator that checks permissions """</span>
def decorator(view_func):
<span class="green">""" Wraps get_context_data on generic view classes """</span>
@wraps(view_func, assigned=available_attrs(view_func))
def _wrapped_view(**kwargs):
<span class="green">""" Gets objects from get_context_data and runs check """</span>
context = view_func(**kwargs)
obj_list = context.get('object_list', [])
if not obj_list:
obj = context.get('subobject',
context.get('object', None))
if obj:
obj_list = [obj, ]
check_permissions(decklass.perms, decklass.user, obj_list)
return context
return _wrapped_view
return decorator
<span class="blue">def dispatch_setuser</span>(decklass):
<span class="green">""" Decorate dispatch to add user to decorator class """</span>
def decorator(view_func):
@wraps(view_func, assigned=available_attrs(view_func))
def _wrapped_view(request, *args, **kwargs):
if request:
decklass.user = request.user
return view_func(request, *args, **kwargs)
return _wrapped_view
return decorator
</pre><br />
Although all this works fine, it does seem overly burdened with syntactic sugar. I imagine there may be a more concise way to achieve the results I want. If anyone can think of one, please comment below.<br />
<br />
* I didnt show the code for class_decorator since it is just a simplified version of the class_permissions example above.Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0tag:blogger.com,1999:blog-6603837339236629698.post-76398113367719465922012-02-12T12:48:00.003+00:002012-06-16T09:00:59.613+01:00Finally I have left behind the printed wordThe printing press was invented by Gutenberg getting on for 600 years ago, as a technology the printed book had a good innings compared to the cassette tape or the floppy disk. But maybe its time to recognize that at least as far as text only mass paperback printed media goes, its unlikely to reach that 600th birthday. So I finally decided to stop being a Luddite and buy an e-reader this Christmas, along with 1.3 million other people in Britain. I actually bought it for my partner but I have ended up hogging it.<br />
<br />
So what's so great about an e-reader?, well its a good excuse to get back into the classics, anything out of copyright is available free. Over the last month I managed to plough through a book a piece by George Orwell, Thomas Hardy, Charles Dickens, Joseph Conrad and Emily Bronte. But in addition to that any book is available instantaneously either from repositories or Amazon and its competitors. Or for the less scrupulous there is the twilight world of file sharing networks - where all e-books are available for free.<br />
<br />
Once the habit is formed of reading via a Kindle or other e-reader it quickly becomes just as natural as turning pages - I prefer the cheaper LCD grey scale readers without touch screen or keyboard - they are the simplest direct replacement for a paperback, and if current price wars continue will soon cost about the same as a hardback! For that you get wireless, space for 1500 books and a month battery life.<br />
<br />
So you may be thinking - OK big deal - but why are you talking about this on a python blog? Well the reason is I fancied reading other digital text formats on my e-reader - what if I want a copy of a pdf, some sphinx docs or even a whole website for example. I soon came across the open source <a href="http://calibre-ebook.com/">Calibre software</a>, written in Python and C and available on linux, Mac and Windows. Kovid Goyal may not be Gutenberg - but he has certainly produced a really good e-book management software package.<br />
<br />
Once you have installed the software it sets up that machine as your personal e-book library, to which you can add pretty much any text format you wish, along with news feeds or other HTML content. Plug in your e-reader then press the convert e-book button to translate them to .mobi or whatever, and another to send to the device. BeautifulSoup is used to help with HTML conversion and there is an API for creating <a href="http://manual.calibre-ebook.com/news_recipe.html">conversion recipes</a>. The software also includes a server to make your personal e-library available to you over the web, book repository browsers, synchronisation tools, handling of annotations, etc.<br />
<br />
Friends talk about missing the tactile experience, the smell, but with a good e-reader and good e-book management software - I can't really justify lugging around those funny old glued together sheets of bleached wood pulp any more ;-)Edhttp://www.blogger.com/profile/09753091138104619483noreply@blogger.com0