The Current State of AI Software Generation
The user tries to describe what they want generated in terms of a snippet of high level programming language code using standard English. They submit it to the AI tool. So what are they asking the AI to generate and how does it do it?
The high level language
High level programming languages are human languages composed of english and maths symbols designed for the comprehension and composition of precise computer instructions. The language makes no more sense than English to a computer. It has to be compiled or interpreted to computer language for it to run. So it may compile to an intermediate bytecode language and then maybe to human readable assembly language - before final translation into the unreadable machine code that the computer runs.
A programmer learns the high level language and becomes fluent in it. They can read and understand the functionality of that code. With the complexity of the machine specific implementation stripped away.
Leaving just the precise functional maths and english / symbology that describes the computer functionality. They think in that code, in order to write it.
Even then, the majority of a programmers time is spent debugging the high level language - and fixing what they have written to be bug free. Because it is difficult to think clearly in code, pre-determining all edge cases etc.
Unlike English language, it can succinctly describe computer functionality in a few lines.
The AI
A detailed English language description of what functionality is required. Plus the name of a high level programming language, are submitted to the AI tool.
It does a search of the web, eg. stack overflow etc. for results for that code language. For Chatbot use (eg. ChatGPT) it applies an English language Large Language Model, LLM (a numeric encoding of learning of the English language) to generate a well phrased aggregation of the most popular results that match the English prompt.
For software use (eg. CoPilot) it works just the same, but the LLM learns English to high level software language aggregate translation. From code examples data, eg. github, to generate what the code syntax might be to match the English description of it.
Finally it returns an untested snippet of generated high level code.
The Non-Developer
The non-developer pastes it in place and tries to run the program with it in.
They may be able to puzzle out the high level language - but don't naturally think in it, just as people without mathematics skills can only think as far as basic arithmetic and are dyslexic when it comes to complex equations.
It seems to work around 50% of the time. When it fails they, go back to square one and try to rephrase their English prompt.
They patch together block after block of prompt created generated code. A crazy paving of a program that likely has a number of bugs and inappropriate features in it. But it kind of works, for the non-developer, that is good enough.
The code gets pushed out there with all its imperfections, and starts to populate the web of code data that is used to generate the next AI code snippet.
Or the Developer
They cut paste and rewrite it, using it as a hint tool. Or an extension to their IDE's existing auto-code generation tools that work using templated code and language / import library searches.
Hopefully their IDE is set up to clearer distinguish between real code completions and possible generative code completions. Since otherwise the percentage of nonsense code created by the generative AI pollutes the 100% reliability of IDE code completion, and harms productivity.
Then they run their code and debug as usual.
At least 75% of programming time is not on writing code, but on making sure that the high level instructions are exactly correct for generating bug free machine code. So iteratively refining the lines of code. With code a single comma out of place can break the whole program. When language has to be so carefully groomed, succinct minimal language is essential.
For many developers adding an imprecise, non mathematical language, that is entirely unsuited to defining machine code instructions, such as English, to generate such code is problematic. It introduces a whole layer of imprecision, complexity and bugs to the process. Slowing it right down. Along with requiring developers to write a lot lot more sentences (in English) rather than just quickly typing out the succinct lines of Python (or similar) programming language they have in their head.
The generative AI can help students and others who are learning to code in a computer language, but can it actually improve productivity for real, full time, developers who are fluent in that language?
I think that question is currently debatable. Because I believe the goal of adding yet another language to the stack of languages that need to be interpreted for humans authoring computer code, especially one as unsuited as English, is only useful for people who are far from fluent in the software language.
Once we move beyond error prone early releases of LLMs like ChatGPT-4 then tools such as CoPilot may start to become much more effective at authoring software, and actually produce code that is as likely to work first time and have the same amount of bugs as your average software developer's first cut of the code. We may reach that point within a year or two. At which point professional software developer will need to be adept at using it as part of their toolset.
Even so I believe the whole conception of the application of AI to writing software could benefit from more work engaged in a computer centric alternative approach to the current one focussed on generating plausible human language responses. It only dominates because of all the efforts related to NLP and human interaction. But taking that and sticking on to writing human software languages is more about creating a revenue stream than attempting to have AI do the main work of software development.
Until then, AI will never be able to replace me, as a software developer. Only be another IDE tool I need to learn ... in time when it improves sufficiently to increase productivity.
NOTE - June 2024 Update
Having come back to CoPilot 6 months later. I have come to appreciate some of its new features so have added a new blog post that accepts that it now provides utility even for the seasoned programmer.
Another Way
Copilot and the like currently use the ChatGPT approach of a Chatbot front end tied to an English language LLM to generate aggregate search engine results in a human language. But there is no domain specific machine learning knowledge about the semantics of the content. So it doesn't understand, and certainly doesn't pre-check the code. Just as ChatGPT doesn't understand the search engine content. Since currently there are no domain specific trained models for the content in the loop. So if asked a question about pharmacy it doesn't plug in one of the AI models that has learnt pharmacy and is used by that industry to aid in the development of medicines. It understands nothing, it is a chatbot, just a constructor of plausible answers based on search popularity.Similarly CoPilot has learnt how to predict what code somebody might be trying to write, but it hasn't learnt how to code.
This approach cannot lead to AI generating innovative new coding approaches, full self-coding computers, or remove the need for human readable high level programming languages.
There have been experiments with applying test driven development to AI generated code, but I have not heard of serious attempts to address the bigger picture...
- Move all functional code writing to be AI only.
- Remove the need for any high level computer language for humans to gain fluency in.
- Have AI develop software by hundreds of thousands of iterative composition TDD cycles.
- Parallel refactoring thousands of solutions to arrive at the optimum one.
- Use AI that understands the machine code it is generating by training it on the results of running that code.
- The ML training cycle must be running code not matching outputs to pre-ranked static result training sets.
- In addition to the static LLM that encodes the learning of machine code authoring, dynamic training cycles should be run as part of the code composition. Task based ephemeral training models.
- Get rid of the wasted effort training AI to understand English, Python, Java, Go or any other existing human language evolved for other tasks.
- Finally we are left with the job of telling the computer what its software should do.
We do not want to use English for that, its way too verbose and inaccurate, similarly we don't want a full high level programming language to do it. We need a new half way house. A domain specific language (DSL) for defining functionality only, designed for giving software specification's to AI that it can use to generate automated test suites.
Self-Programming Computers
Exploring the last point in more detail...
Create a higher level pseudo-code language for describing the required functionality that is more English readable than even current high level languages such as Python.
Make that functional DSL focus on defining inputs and outputs - not creating the functionality, but creating the black box functional tests that describe what the working code should do.
Maybe add tools for a slightly no-code approach, with visual generators for the language, eg graphical pipeline builder tools. For people who find thinking visually easier than thinking symbolically.
The software creator uses the DSL to create an extensive set of functional definitions for a project.
The DSL language design and evolution is optimised for LLM interpretation. So it has very tight grammatical and syntactical usage that promote accurate generative outputs.
A new non-developer friendly high level pseudo code language / rigorous AI prompt writing lingo.
Some basic characteristics of the DSL:
- auto-formatting (like Go) minimizing syntactical variation
- To quote Python's creator - 'There should be one-- and preferably only one --obvious way to do it.'
But strictly applied, rather than as a vague principle as Python does - unlike any other high level language, the design needs to be optimized only for specifying functionality, a high level templating language from which test suites are generated.
- the language will never be used to implement functionality
- uses simple english vocabulary and ideally minimal mathematical symbology
These DSL prompts are written with a LLM for the DSL it helps create its own prompts and the code creator uses it to refine all the DSL definitions that specify the full functionality.
The specification DSL auto generates all the required tests in a low level language.
Since the system should also have a generative AI LLM trained for C or assembly language.
This is what creates the actual functional code by iteratively running and rewriting it against the specification encoded into the tests.
The AI tool then generates the tests for that implementation and uses TDD to generate the actual functional code - eventually the system should improve to a level better than most software developers. The code it writes no longer needs to be read by a human - because a human will be unable to debug it at anything like the speed the AI tool can.
So we use generative AI to do the part of the job that actually takes all the time. Debugging, refactoring and maintaining the code, making sure it really does what is required functionally. Rather than the quick job of writing a first cut of it that might run without crashing.
Most importantly we don't introduce the use of the full English language, the language of Shakespeare, the language of puns, double meanings, multiple interpretations, shades of grey, implied feeling and emotions, into a binary world to which it is entirely unsuited.
Also we don't need English or high level computer languages in the stack of mistranslation at all.
Because we are not training the AI to understand human languages. We are training it to write its own machine code language based on defining what behaviour it should implement.
BDD / TDD generative AI if you like.
Human's no longer learn complex mathematical process based languages that can be translated into machine code. They learn a more generic language for specifying functional behaviour.
This gives more freedom to widen the DSL to mature into a general precise AI prompt language.
Whilst allowing computers to evolve more machine learning driven software architectures that are self maintaining and not so constrained into the models imposed by current human intelligence and coding practise based programming languages.
Could AI could take my job?
Perhaps if all of the above were in place, then finally we would arrive at a place where AI could replace traditional software development and high level software languages.With concerted effort it could be in 10 years, if some big companies put some serious investment in trying to replace traditional software development.
Code monkeys will all be automated. Only software architects would be required and they would use a new functional specification AI prompt language, not a programming language.
Of course if politicians are scared that dumb ChatGPT can already write as good a speech as they can. Plus replicate all the prejudices and errors of its training data and trainers.
Then setting AI free to fully write software, and itself ... will be way more scary in its long term implications.
Meanwhile we are currently at a place where it arguably doesn't even improve productivity for an experienced software developer, only allows non-developers, students and other language newbies to have a go at writing one of the many dialects of human languages, known as computer languages.
Their mix of math, english, symbols, logic and process may appear more like English than Musical notation or pure maths, but sadly they are no more suited to creation by an English language Chatbot approach.