Sunday 2 August 2015

Using Java Spring & MyBatis for dynamic schema integration

Using Spring MyBatis for dynamic schema integration may perhaps be subtitled, "Eating soup with a fork" ... since that is how it has felt at times. However that may partially be due to my lack of familiarity with these tools. So this blog posting is about why I got that feeling, how I bent my fork to the purpose ... and if I missed some tricks in doing that. Please let me know by commenting below :-)

Object-relational mapping - a personal history

I have been developing relational database applications for many years so I have been through the various stages of database persistence approaches. Starting with fairly raw SQL development or at most just fixed DAO wrapping. Through SQL templating languages and mapping approaches to largely adopting the full object-relational mapping, ORM, approach for the past few years. I even tended to use the same pseudo ORM top level for NoSQL sources, although of course a lot of ORM features then become inapplicable, such as use of an ORM model's relational navigation (eg. person.department.head.surname)

Having said that the object-relational impedance mismatch can still bite. However good an ORM is there is some need for understanding of what its API simplifications and abstractions are doing. To avoid excessive  / badly performant SQL. But given that caveat, the amount of time saved in development and maintenance by a well implemented full ORM inevitably saves money on all but the simplest persistence requirements. Similarly the automation involved in schema migration, constraint, indexing and other data modelling needs is invaluable. As long as one knows what it is doing under the hood and hence when that 10-20% of database development / customised data design is required. For example custom functions and triggers, data writable views that present as a table for the ORM, etc.

MyBatis - is it an ORM I see before me?

Recently I have started working with a team who use an older Java technology set of Spring and MyBatis. My current project uses the Java MyBatis SQL framework to integrate various relational database sources.
MyBatis badges itself as a persistence framework. So it is not really claiming to be an ORM. It started as an SQL templating engine and that is still where its core skills lie. But it now has many other features bolted on. Including meta programming to generate Java relational mappers from database sources.

So it does have many ORMish features. However using MyBatis has been rather a painful journey for me, since I have found that it fails to implement many ORM design pattern principles that my ORM habituation had come to expect. Also using MyBatis with Spring MVC means we havent got proper MVC since the model is not a model as it would be with Hibernate ... its a MyBatis Mapper.
This is not just a minor niggle, on first use a core ORM feature is gone. A Mapper is not an object model of a table's row of data, with transaction and session management under the hood. Instead it is just a convenient object to hook up standard data queries and updates to with the management of saves and synchronisation of actions on Mappers and results returned by them, entirely manual. So in that sense you have less than what some more basic DAO wrappers give you, ie. when you update a record then fetch the values from that record they will return the data written into and queried back from the database, hence an inserted saved model would automatically hold the new primary key(s).
With Mappers the data readable from the Mapper will just be the data set to be updated not the actual data held in the database as a result of that update. To find out the result of your query you need to ensure you get a fresh query result, by requerying or flushing session.

There is a work around for this common requirement, you can annotate an insert with a custom select query to tell MyBatis to re-query just for the key manually before or after doing the update. However I found this didn't work well for Oracle and was a problem anyway with what I required. Largely because I was using Generator based classes, so adding custom annotations to these would not have worked for me.

@SelectKey(statement="SELECT MY_SEQUENCE.NEXTVAL() FROM DUAL", keyProperty="nameId", before=true, resultType=int.class) 
int insertMyTable(Name name); 

Similarly there is no lazy update concept, updates are only and always done when you specifically run them. In effect its like using an ORM but disabling all the relational modelling, data synchronisation machinery and transactional management. Mappers are instead just convenience wrappers for running SQL - MyBatis Generator may do the ORM process of creating Java classes that (via XML) map to database tables or views. But mapper instances are not ORM synchronised table row objects.

The normal object centric approach is also not available. Whilst you can generate code from your database, you cannot generate your database from code. So full persistence data cycle management is not available out of the box. By that I mean the use of test harness and deployment tools that destroy and recreate the full database, populate it from fixtures, run tests and drop it again. A migration tool that introspects databases and can migrate back and fortbr /br /br /h between schema versions to sync them with code versions or vice versa, ie code -> data, schema management, in addition to doing the MyBatis Generator data -> code direction. So the sort of thing that a full stack web framework like Django has with its ORM.

This was a blow for my particular project since full data cycle management is exactly what I required for developing a central data aggregation and integration database. I also required proper fixture management for testing and the same serialized data input / output components of this being used for the main work of data aggregation from data sources in serialized format (XML, CSV etc.)
Finally the system was dealing with pulling in and delivering back out complex, evolving schemas from source to consumer databases. The one thing that would be essential is that all the schema handling and data typing be dynamic so that the code can handle this change rate without constantly breaking and having to be rewritten. So hard coded schema details and static typing must be avoided for any chance of maintainability. But it was to be written in a statically typed language. Again the tool was not suited to the job. So how did I bend this seemingly unsuitable platform to the task in hand.

Bending the fork

The first step was to tackle the dynamic schema vs. static typing issue. Java reflection is your friend here. Then reinvent some of the missing ORM wheels. Fixture loading, schema generation.
Along with working around the lack of object data synchronisation and automatic session management. Finally we wanted to make our data population speedy by only updating data that needed updates. So a hashing mechanism that could work with the serialized data was required.
Note that our in house Java standards decree that all configuration should be done via annotations rather than XML where available ... so that is pretty much everything in Spring ... aside from the core MyBatis Mappers XML files. The aggregation database was Oracle and various database's including MS SQL Server were the data sources.

Dynamic schema handling

Use MyBatis Generator, adding a config file for each database source and the target aggregation database.
Add the name of each table or view required for them to these config files ... eg.
<table tableName="PERSON" modelType="flat"></table>

ideally that would be all the schema customisation required as long as data source and target naming of tables and columns matched. However in a handful of cases mapping data (eg. tableA.colB = tableC.colD) was needed in the properties files.

I ran this as part of the Maven build at first, but it proved more manageable to be able to run code generation for the sources and target databases at different times, so  I switched to wrapping this up in a command class for triggering independently via a batch job scheduler. In the same way as the data population commands were to be run. Schema -> code generation tended to be much faster than either data population or code -> schema generation, so freqprepreuent execution was not an issue.

The core class I wrote for enabling dynamic schema handling was called TableMeta and each table has an instance of this class listing all the metadata about its columns and primary keys etc.

This class was injected into a ModelFactoryService whose job is to return results or add, edit and delete models from any Mapper.
The Factory Service has a reflection based method for invoking the MyBatis generated Mapper methods, e.g. "SelectByExample"  ...

  protected Object invokeMapperMethod(String methodName, Object parameter) {
    Class klass = exampleClass; 
    if (parameter != null) {
      klass = parameter.getClass();
    } else {
      parameter = newExample();
    }
    try {
      Method method = mapperClass.getDeclaredMethod(methodName, klass);
      try {
        // Run the method  
        return method.invoke(mapper, parameter);
      } catch (IllegalAccessException | IllegalArgumentException exRun) {
        Logger.getLogger("ModelFactoryService").log(
         Level.SEVERE, modelClass.getSimpleName() + "." + methodName, exRun);
      }
    } catch (NoSuchMethodException | SecurityException exCall) {
      Logger.getLogger("ModelFactoryService").log(
         Level.SEVERE, "Mapper class has no " + methodName + " method", exCall);
    }
    return 0;
 }


... similarly there were reflection based set and get column methods surfaced by whole Mapper data modification methods such as doUpdate(Object model)
The TableMeta caters for initial setup of the classes generated for that particular table by MyBatis Generator. Straight use of Class.forName(className) works with manipulation of the table name string copying that of the Generator's standard naming convention. The table's Mapper, Model and Example classes are then added as the template classes for the Factory. From that a Setters and Getters hash of the table's Mapper methods for each column can be built automatically.

Fixture loading

A sax parser based loader class reads in data fixtures in XML format. Each fixture row is parsed to a hash of tag name to value which can be passed via a SaverService to the ModelFactory.
An ObjectConvertor static class caters for de-serialising fixture data to the correct Java types looked up via the column name from TableMeta, and the Setter hash can then be used to update the matching named columns in the model.

Fixture dumping

For fixture dumping I cheated a little. Since I was only ever dumping from the target Oracle database then rather than reinventing a full serialisation of models (e.g. ORM with built in serialisation from any database to xml, json, yaml, csv etc.) I just provided a tool to do the minimum required for my testing and development needs - to just be able to serialise any table or view to XML from Oracle.
So in this case, to avoid the maintenance madness of a separate statically typed fixed schema solution for each table - I did the XML part in Oracle.
That way I could use a GenericMapper which generated two string fields based on a dynamic SQL query built from the TableMeta. The first being the row XML and the other the concatentated list of the serialised columns to be contained within it. MyBatis @SelectProvider annotation allows the gluing on of a method taken from another class to generate SQL ...

 @SelectProvider(type = TableDump.class, method = "getXMLProvider")
 List<GenericModel> getXML(@Param("meta") TableMeta meta, @Param("rows") final int rows);

The method glued on is this one which uses Oracle's native XML methods and the TableMeta's list of columns to generate a query that directly returns the XML..

 /**
   * Dynamic SQL generator method generates XML output for fixture as row and cols 
   */
  public String getXMLProvider(Map params) {
    final TableMeta meta = (TableMeta) params.get("meta");
    final int rows = (int) params.get("rows");

    String sql = new SQL() {
      {
        String rowXML = " '' || XMLElement(\"row\", XMLAttributes(";
        for (String col : meta.getPKeys()) {
          rowXML += col + " AS \"" + col.toLowerCase() + "\", ";
        }
        rowXML = rowXML.substring(0, rowXML.length() - 2) + ")) \"ROWKEYS\"";
        SELECT(rowXML);
        String queryXML = "''";
        col_loop:
        for (String col : meta.getCols()) {
          queryXML += " || XMLElement(\"" + col.toLowerCase() + "\", " + col + ")";
        }
        SELECT(queryXML + " \"ROWCOLS\"");
        FROM(meta.getTableName());
        for (String col : meta.getPKeys()) {
          ORDER_BY(col);
        }
        if (rows > 0) {
          WHERE("rownum <= " + rows);
        }
      }
    }.toString();
    return sql;
  }

This above snippet shows an example of MyBatis SQL templating - the core of MyBatis.
The result is a single Generic Mapper that by calling with the appropriate tables TableMeta can return an XML dump of any table in the target database. For test fixture generation this can be passed a result length ... since usually ten or twenty will be sufficient for integration testing.

Save methods

I won't go into the save methods in detail - suffice to say the data was hashed on the way into Oracle and then a query of primary keys to last modified hashed data returned of all rows to allow incremental update by hashing each of the XML data sources rows and checking it first to determine if an update or insert was required. Whilst this could not take advantage of databases or Java's hashing - since it had to be serialised data compatible - it did have a significant impact. Since the rate of incremental updates meant that we are getting a maximum of 1% data churn per table, so even with all the hashing and comparision overhead - an incremental update is still around 30 times faster.

For versioned data a core data table and a versioning table was required and this then needed both to get access to the new primary keys. As mentioned this is not straight forward in MyBatis - so the easiest solution was to have a bespoke version Mapper that just wrapped a single versioning sequence and could be used to update the related data and version tables by calling custom VersionMappers nextVal or currVal methods.

Code Schema Cycle

As mentioned MyBatis doesnt cover the code to schema half of the persistence cycle, so the option was to employ a separate full schema life cycle framework such as Liquibase. Or roll my own. In this case my requirements were not for extensive migration features. Since as an aggregation database the schema could be snapshotted dropped and rebuilt. So to avoid complexity and further dependencies I just added a tool to build Oracle schema from standard dumps of the schema from an Oracle client tool. So you just dump each object to a separate DDL SQL file, then it checks all the files works out which object type they relate to and loads them in a sequence which should prevent referential integrity clashes, ie.

DATABASE LINK, SEQUENCE, TABLE, MATERIALISED VIEW, FUNCTION, VIEW,  INDEX, PROCEDURE, TRIGGER, CONSTRAINT, FOREIGN KEY

This is wrapped up as a DatabaseReset command that either drops all data or the full database and rebuilds the schema. So a run of  reset followed by the GenerateMappers command gives a freshly built persistence layer and Model code to talk to it. It would be nice to have the schema built from the Mappers code and a more standard coherent code schema cycle, but given Mappers are in use, that would not be available even if Liquibase were in the mix.

Spring issues

So as a newbie to MyBatis I guess I had some problems with it. But as long as this slightly irascible post is. Guess what, I also had issues with Spring too ... so I will make it even longer by getting those off my chest too :-)

It seems to require huge amounts of configuration to do what you want. Again it can be made to do most things, but ... and maybe to some extent ... because of that ... it takes an awful lot of configuration to do some of the things a more opinionated full stack web framework would do out of the box. I guess I need to come across the sort of edge cases which are really difficult in a full stack framework, to appreciate Spring, since currently it feels like it needs a great deal of maintenance heavy tinkering to do some of the basics.

So the first surprise was that all my Spring @Service classes default to singletons. So in order to have for example a ModelFactory for the two different tables a versioned update required I needed to build a BeanFactory and make these beans and annotate them as prototypes - ie normal classes not singletons. I guess this is because all of this related to command classes not Spring MVC web classes ... which would have had web session scope.

  @Bean
  @Scope(value = ConfigurableBeanFactory.SCOPE_PROTOTYPE)
  public ModelFactoryService modelService() {
    return new ModelFactoryService(prop, hashService());
  }

So all my indvidually injected service classes tended to have to be uninjected and instead the bean factory injected in their place.

Along with that the database session to mapper class connection seemed rather frail.
The only way to ensure this worked was never to use a generic session handling method but always use separate prototype beans for separately named session classes annotated to find the correct mappers for that particular database ...

@Configuration
@MapperScan(basePackages = "uk.ac.bris.adt.erp.dataint.sources.mm", sqlSessionFactoryRef = "MMSessionFactory")
public class MMDBConfig extends DBConfigABC { ... }

... not only that the directory hierarchy matters due to the niceties of MyBatis-Spring , since the MapperScan will find anything at or below a directory ... so you cannot put mappers for one connection below the point you need to scan for another ... or they will be sucked up as Mappers for the wrong session.

Test configuration

Also the test configuration for integration tests required a lot of manual annotations to pick up appropriate configuration environment in order to inject things into the context in a working manner to match the running code configuration. Again I had been spoilt by expecting a framework specific automatically working test harness with default test customisations of the runtime code environment thrown in. Instead each integration test needed to use a wrapper that called a RunnerBeanConfig

@PropertySource("classpath:test.properties")
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(loader = AnnotationConfigContextLoader.class, classes = ITRunnerBeanConfig.class)
public class ModelFactoryServiceIT extends ServiceABC { ... }

The RunnerBeanConfig then needs to have all the database and resources configs annotated as Imports or it cannot find the different database sessions. Only after all that is it possible to inject the runtime command classes for testing.

@ComponentScan("uk.ac.bris.adt.erp.dataint")
@PropertySource("classpath:test.properties")
@Configuration
@Import({ DataIntDBConfig.class, MMDBConfig.class ... ResourceConfig.class })
public class ITRunnerBeanConfig implements EnvironmentAware {

  private Environment env;

  @Override
  public void setEnvironment(Environment environment) {
    this.env = environment;
  }
  ...
}

Conclusion

OK so I got it working in the end. However it all felt rather like I was doing something rather bespoke and complicated. This added a great deal of development time (or more accurately configuration wrangling time) to deliver components of persistence layer management and testing - that I had previously expected would all be available already in a mature high level framework. So in summary I probably have to conclude that if you needed to do this task in a more maintainable and standardised manner with a minimum of custom code. Don't use Spring and MyBatis. However if you already use one of those tools ... and you need to add this sort of functionality with them. Then it is certainly doable, and perhaps if you are a newbie to them, like me, this post may be of some use in speeding the job up for you :-}

Thursday 30 July 2015

Picking a language for shell scripting based on its framework ecosystem


  • Do you work somewhere that has a lot of old shell scripts? 
  • Were they all dashed off quickly according to the whims of their original creator - just deploy scripts so maybe they missed some of the care (and documentation) the application code had?
  • Whenever a major revision of said shell scripts is needed, do all of them end up in the bin - sorry trash in case you thought I meant the /usr/bin. Because its easier to write such platform and author specific scripts from scratch than maintain them?

Well the team I recently joined does. So as somebody who personally moved from shell scripts to a shell compatible object orientated scripting language some time ago. Then on to a shell framework, I was asked to assess what to move our menagerie of bash, zshell, cshell, powershell etc. scripts to.

My remit was to recommend a language instead of all those developer specific procedural shell scripts. After all the point of a mainstream language is to introduce cross platform code with a huge set of useful libraries, along with hopefully some standard approaches. Hence get above the sea of shells.

However my preference is to move the team to using a shell framework.

Because what a framework does is give you opinions. It chooses how to do stuff, so you ... and all the developers that come after you, don't have to. In my opinion most of the productivity comes from everyone becoming familiar with the same way of doing things ... whether or not its the best way
(and if it isn't the best way the framework developers spend a lot of time fixing that way with hopefully the minimum of disruption to its API for the end users).

I felt it would be useful to see how active the shell framework ecosystem of a language was, in order to assess how suited to shell scripting it is. Since the goal of a shell framework is essentially the same goal, standardisation of shell scripts. So one indicator for the most suited language is to list the mainstream shell frameworks for the contenders and see which comes out top.

It was decided we will choose from Javascript, Python or Perl. So this is for a team that almost exclusively use Java, which due to its pre-compiled statically typed nature is inherently poorly suited to shell scripting.
Unfortunately Ruby was not to be a candidate. However I have added it anyway, because if its good enough for Amazon, and leading configuration management frameworks such as Puppet ... it should be included.

Strangely it is difficult to find definitive lists of shell frameworks ... perhaps whilst I thought the term had been around for 10 years or more, it has not been coined very effectively ... its not even on Wikipedia ... maybe I made it up! Anyway they exist whether or not there is a proper name for them ... they also often cross over the grey area into being full blown configuration management tools.
So to avoid that argument ... I will add those in as well. I have marked entries with a * where I think it is more a config management tool than a shell framework.

For configuration management systems wikipedia does have an entry ... so using it as a similar benchmark it gives a relative language score of:

Python 11, C 5, Perl 5, Java 4, Ruby 3,
and 1 each for C++, Scala, Erlang, Go and PHP  (none for Javascript)

So on the same sort of  basis I have ranked the languages with the most healthy shell framework ecosystem ... in the belief that it is a good indication that they may be the most suited to being used for shell scripting ...

Python 8

Python has the largest number of well established shell frameworks, as it does in terms of configuration management systems. So on that basis I could award it first prize as the most used language for standardising shell scripting. However numbers are not everything. It doesn't necessarily have the leaders of those two categories of software (see Ruby below)
  1. fabric
  2. cliff
  3. python-deploy-framework
  4. clustershell
  5. builtoncement
  6. ansible *
  7. salt *
  8. bcfg *

Ruby 6

Ruby has capistrano which is possibly the most popular of all shell frameworks, it also has two of the most popular configuration management systems, Puppet and Chef, so on that basis it should perhaps be first or at least joint first with Python. As the most appropriate language for modern shell scripting.

Perl 3

Perl has things that perhaps the other languages don't in related areas. So a number of shell implementations in Perl, and frameworks for writing custom shells. But in terms of what I mean by a shell framework - ie. a tool to run shell commands in a standard way across different servers, shells and platforms it is surprisingly lacking.

Javascript 2

Ummm well Javascript may have broken free of the browser with node.js and other server side runtime environments. But are there any shell frameworks or configuration management systems built with it? ie. tools designed for running shell commands and doing anything other than Javascript installation. Not much TBH and certainly no configuration management engine.
  1. commander.js   (clone of Ruby commander)
  2. shelljs is the leading bunch of utilities for node.js that can act as a shell framework


Conclusion

Its a tie between Python and Ruby, so the decision as to what to go for depends on the individual tools that best suit your shell framework requirements, what they may need to integrate with wrt. configuration management and deployment tools ... plus the existing skills of your developers.
Oh and whether diversity is more important than being the market leader ... or vice versa.

For my own case, Ruby isn't in the running, so that makes it Python based on this premise.
I may follow up this blog post with one comparing the languages wrt. a sample shell script - and see which looks the most maintainable. As a second means of deciding between them.

Note: One other factor is related virtual frameworks ( ... yep made that term up too!)
Basically these wrap up configuration and deployment of virtualised software deployments - either hypervisors or containers - to give a full platform application build - usually for development purposes.
So here Ruby is well ahead with Vagrant along with box grinder