Sunday 2 August 2015

Using Java Spring & MyBatis for dynamic schema integration

Using Spring MyBatis for dynamic schema integration may perhaps be subtitled, "Eating soup with a fork" ... since that is how it has felt at times. However that may partially be due to my lack of familiarity with these tools. So this blog posting is about why I got that feeling, how I bent my fork to the purpose ... and if I missed some tricks in doing that. Please let me know by commenting below :-)

Object-relational mapping - a personal history

I have been developing relational database applications for many years so I have been through the various stages of database persistence approaches. Starting with fairly raw SQL development or at most just fixed DAO wrapping. Through SQL templating languages and mapping approaches to largely adopting the full object-relational mapping, ORM, approach for the past few years. I even tended to use the same pseudo ORM top level for NoSQL sources, although of course a lot of ORM features then become inapplicable, such as use of an ORM model's relational navigation (eg. person.department.head.surname)

Having said that the object-relational impedance mismatch can still bite. However good an ORM is there is some need for understanding of what its API simplifications and abstractions are doing. To avoid excessive  / badly performant SQL. But given that caveat, the amount of time saved in development and maintenance by a well implemented full ORM inevitably saves money on all but the simplest persistence requirements. Similarly the automation involved in schema migration, constraint, indexing and other data modelling needs is invaluable. As long as one knows what it is doing under the hood and hence when that 10-20% of database development / customised data design is required. For example custom functions and triggers, data writable views that present as a table for the ORM, etc.

MyBatis - is it an ORM I see before me?

Recently I have started working with a team who use an older Java technology set of Spring and MyBatis. My current project uses the Java MyBatis SQL framework to integrate various relational database sources.
MyBatis badges itself as a persistence framework. So it is not really claiming to be an ORM. It started as an SQL templating engine and that is still where its core skills lie. But it now has many other features bolted on. Including meta programming to generate Java relational mappers from database sources.

So it does have many ORMish features. However using MyBatis has been rather a painful journey for me, since I have found that it fails to implement many ORM design pattern principles that my ORM habituation had come to expect. Also using MyBatis with Spring MVC means we havent got proper MVC since the model is not a model as it would be with Hibernate ... its a MyBatis Mapper.
This is not just a minor niggle, on first use a core ORM feature is gone. A Mapper is not an object model of a table's row of data, with transaction and session management under the hood. Instead it is just a convenient object to hook up standard data queries and updates to with the management of saves and synchronisation of actions on Mappers and results returned by them, entirely manual. So in that sense you have less than what some more basic DAO wrappers give you, ie. when you update a record then fetch the values from that record they will return the data written into and queried back from the database, hence an inserted saved model would automatically hold the new primary key(s).
With Mappers the data readable from the Mapper will just be the data set to be updated not the actual data held in the database as a result of that update. To find out the result of your query you need to ensure you get a fresh query result, by requerying or flushing session.

There is a work around for this common requirement, you can annotate an insert with a custom select query to tell MyBatis to re-query just for the key manually before or after doing the update. However I found this didn't work well for Oracle and was a problem anyway with what I required. Largely because I was using Generator based classes, so adding custom annotations to these would not have worked for me.

@SelectKey(statement="SELECT MY_SEQUENCE.NEXTVAL() FROM DUAL", keyProperty="nameId", before=true, resultType=int.class) 
int insertMyTable(Name name); 

Similarly there is no lazy update concept, updates are only and always done when you specifically run them. In effect its like using an ORM but disabling all the relational modelling, data synchronisation machinery and transactional management. Mappers are instead just convenience wrappers for running SQL - MyBatis Generator may do the ORM process of creating Java classes that (via XML) map to database tables or views. But mapper instances are not ORM synchronised table row objects.

The normal object centric approach is also not available. Whilst you can generate code from your database, you cannot generate your database from code. So full persistence data cycle management is not available out of the box. By that I mean the use of test harness and deployment tools that destroy and recreate the full database, populate it from fixtures, run tests and drop it again. A migration tool that introspects databases and can migrate back and fortbr /br /br /h between schema versions to sync them with code versions or vice versa, ie code -> data, schema management, in addition to doing the MyBatis Generator data -> code direction. So the sort of thing that a full stack web framework like Django has with its ORM.

This was a blow for my particular project since full data cycle management is exactly what I required for developing a central data aggregation and integration database. I also required proper fixture management for testing and the same serialized data input / output components of this being used for the main work of data aggregation from data sources in serialized format (XML, CSV etc.)
Finally the system was dealing with pulling in and delivering back out complex, evolving schemas from source to consumer databases. The one thing that would be essential is that all the schema handling and data typing be dynamic so that the code can handle this change rate without constantly breaking and having to be rewritten. So hard coded schema details and static typing must be avoided for any chance of maintainability. But it was to be written in a statically typed language. Again the tool was not suited to the job. So how did I bend this seemingly unsuitable platform to the task in hand.

Bending the fork

The first step was to tackle the dynamic schema vs. static typing issue. Java reflection is your friend here. Then reinvent some of the missing ORM wheels. Fixture loading, schema generation.
Along with working around the lack of object data synchronisation and automatic session management. Finally we wanted to make our data population speedy by only updating data that needed updates. So a hashing mechanism that could work with the serialized data was required.
Note that our in house Java standards decree that all configuration should be done via annotations rather than XML where available ... so that is pretty much everything in Spring ... aside from the core MyBatis Mappers XML files. The aggregation database was Oracle and various database's including MS SQL Server were the data sources.

Dynamic schema handling

Use MyBatis Generator, adding a config file for each database source and the target aggregation database.
Add the name of each table or view required for them to these config files ... eg.
<table tableName="PERSON" modelType="flat"></table>

ideally that would be all the schema customisation required as long as data source and target naming of tables and columns matched. However in a handful of cases mapping data (eg. tableA.colB = tableC.colD) was needed in the properties files.

I ran this as part of the Maven build at first, but it proved more manageable to be able to run code generation for the sources and target databases at different times, so  I switched to wrapping this up in a command class for triggering independently via a batch job scheduler. In the same way as the data population commands were to be run. Schema -> code generation tended to be much faster than either data population or code -> schema generation, so freqprepreuent execution was not an issue.

The core class I wrote for enabling dynamic schema handling was called TableMeta and each table has an instance of this class listing all the metadata about its columns and primary keys etc.

This class was injected into a ModelFactoryService whose job is to return results or add, edit and delete models from any Mapper.
The Factory Service has a reflection based method for invoking the MyBatis generated Mapper methods, e.g. "SelectByExample"  ...

  protected Object invokeMapperMethod(String methodName, Object parameter) {
    Class klass = exampleClass; 
    if (parameter != null) {
      klass = parameter.getClass();
    } else {
      parameter = newExample();
    }
    try {
      Method method = mapperClass.getDeclaredMethod(methodName, klass);
      try {
        // Run the method  
        return method.invoke(mapper, parameter);
      } catch (IllegalAccessException | IllegalArgumentException exRun) {
        Logger.getLogger("ModelFactoryService").log(
         Level.SEVERE, modelClass.getSimpleName() + "." + methodName, exRun);
      }
    } catch (NoSuchMethodException | SecurityException exCall) {
      Logger.getLogger("ModelFactoryService").log(
         Level.SEVERE, "Mapper class has no " + methodName + " method", exCall);
    }
    return 0;
 }


... similarly there were reflection based set and get column methods surfaced by whole Mapper data modification methods such as doUpdate(Object model)
The TableMeta caters for initial setup of the classes generated for that particular table by MyBatis Generator. Straight use of Class.forName(className) works with manipulation of the table name string copying that of the Generator's standard naming convention. The table's Mapper, Model and Example classes are then added as the template classes for the Factory. From that a Setters and Getters hash of the table's Mapper methods for each column can be built automatically.

Fixture loading

A sax parser based loader class reads in data fixtures in XML format. Each fixture row is parsed to a hash of tag name to value which can be passed via a SaverService to the ModelFactory.
An ObjectConvertor static class caters for de-serialising fixture data to the correct Java types looked up via the column name from TableMeta, and the Setter hash can then be used to update the matching named columns in the model.

Fixture dumping

For fixture dumping I cheated a little. Since I was only ever dumping from the target Oracle database then rather than reinventing a full serialisation of models (e.g. ORM with built in serialisation from any database to xml, json, yaml, csv etc.) I just provided a tool to do the minimum required for my testing and development needs - to just be able to serialise any table or view to XML from Oracle.
So in this case, to avoid the maintenance madness of a separate statically typed fixed schema solution for each table - I did the XML part in Oracle.
That way I could use a GenericMapper which generated two string fields based on a dynamic SQL query built from the TableMeta. The first being the row XML and the other the concatentated list of the serialised columns to be contained within it. MyBatis @SelectProvider annotation allows the gluing on of a method taken from another class to generate SQL ...

 @SelectProvider(type = TableDump.class, method = "getXMLProvider")
 List<GenericModel> getXML(@Param("meta") TableMeta meta, @Param("rows") final int rows);

The method glued on is this one which uses Oracle's native XML methods and the TableMeta's list of columns to generate a query that directly returns the XML..

 /**
   * Dynamic SQL generator method generates XML output for fixture as row and cols 
   */
  public String getXMLProvider(Map params) {
    final TableMeta meta = (TableMeta) params.get("meta");
    final int rows = (int) params.get("rows");

    String sql = new SQL() {
      {
        String rowXML = " '' || XMLElement(\"row\", XMLAttributes(";
        for (String col : meta.getPKeys()) {
          rowXML += col + " AS \"" + col.toLowerCase() + "\", ";
        }
        rowXML = rowXML.substring(0, rowXML.length() - 2) + ")) \"ROWKEYS\"";
        SELECT(rowXML);
        String queryXML = "''";
        col_loop:
        for (String col : meta.getCols()) {
          queryXML += " || XMLElement(\"" + col.toLowerCase() + "\", " + col + ")";
        }
        SELECT(queryXML + " \"ROWCOLS\"");
        FROM(meta.getTableName());
        for (String col : meta.getPKeys()) {
          ORDER_BY(col);
        }
        if (rows > 0) {
          WHERE("rownum <= " + rows);
        }
      }
    }.toString();
    return sql;
  }

This above snippet shows an example of MyBatis SQL templating - the core of MyBatis.
The result is a single Generic Mapper that by calling with the appropriate tables TableMeta can return an XML dump of any table in the target database. For test fixture generation this can be passed a result length ... since usually ten or twenty will be sufficient for integration testing.

Save methods

I won't go into the save methods in detail - suffice to say the data was hashed on the way into Oracle and then a query of primary keys to last modified hashed data returned of all rows to allow incremental update by hashing each of the XML data sources rows and checking it first to determine if an update or insert was required. Whilst this could not take advantage of databases or Java's hashing - since it had to be serialised data compatible - it did have a significant impact. Since the rate of incremental updates meant that we are getting a maximum of 1% data churn per table, so even with all the hashing and comparision overhead - an incremental update is still around 30 times faster.

For versioned data a core data table and a versioning table was required and this then needed both to get access to the new primary keys. As mentioned this is not straight forward in MyBatis - so the easiest solution was to have a bespoke version Mapper that just wrapped a single versioning sequence and could be used to update the related data and version tables by calling custom VersionMappers nextVal or currVal methods.

Code Schema Cycle

As mentioned MyBatis doesnt cover the code to schema half of the persistence cycle, so the option was to employ a separate full schema life cycle framework such as Liquibase. Or roll my own. In this case my requirements were not for extensive migration features. Since as an aggregation database the schema could be snapshotted dropped and rebuilt. So to avoid complexity and further dependencies I just added a tool to build Oracle schema from standard dumps of the schema from an Oracle client tool. So you just dump each object to a separate DDL SQL file, then it checks all the files works out which object type they relate to and loads them in a sequence which should prevent referential integrity clashes, ie.

DATABASE LINK, SEQUENCE, TABLE, MATERIALISED VIEW, FUNCTION, VIEW,  INDEX, PROCEDURE, TRIGGER, CONSTRAINT, FOREIGN KEY

This is wrapped up as a DatabaseReset command that either drops all data or the full database and rebuilds the schema. So a run of  reset followed by the GenerateMappers command gives a freshly built persistence layer and Model code to talk to it. It would be nice to have the schema built from the Mappers code and a more standard coherent code schema cycle, but given Mappers are in use, that would not be available even if Liquibase were in the mix.

Spring issues

So as a newbie to MyBatis I guess I had some problems with it. But as long as this slightly irascible post is. Guess what, I also had issues with Spring too ... so I will make it even longer by getting those off my chest too :-)

It seems to require huge amounts of configuration to do what you want. Again it can be made to do most things, but ... and maybe to some extent ... because of that ... it takes an awful lot of configuration to do some of the things a more opinionated full stack web framework would do out of the box. I guess I need to come across the sort of edge cases which are really difficult in a full stack framework, to appreciate Spring, since currently it feels like it needs a great deal of maintenance heavy tinkering to do some of the basics.

So the first surprise was that all my Spring @Service classes default to singletons. So in order to have for example a ModelFactory for the two different tables a versioned update required I needed to build a BeanFactory and make these beans and annotate them as prototypes - ie normal classes not singletons. I guess this is because all of this related to command classes not Spring MVC web classes ... which would have had web session scope.

  @Bean
  @Scope(value = ConfigurableBeanFactory.SCOPE_PROTOTYPE)
  public ModelFactoryService modelService() {
    return new ModelFactoryService(prop, hashService());
  }

So all my indvidually injected service classes tended to have to be uninjected and instead the bean factory injected in their place.

Along with that the database session to mapper class connection seemed rather frail.
The only way to ensure this worked was never to use a generic session handling method but always use separate prototype beans for separately named session classes annotated to find the correct mappers for that particular database ...

@Configuration
@MapperScan(basePackages = "uk.ac.bris.adt.erp.dataint.sources.mm", sqlSessionFactoryRef = "MMSessionFactory")
public class MMDBConfig extends DBConfigABC { ... }

... not only that the directory hierarchy matters due to the niceties of MyBatis-Spring , since the MapperScan will find anything at or below a directory ... so you cannot put mappers for one connection below the point you need to scan for another ... or they will be sucked up as Mappers for the wrong session.

Test configuration

Also the test configuration for integration tests required a lot of manual annotations to pick up appropriate configuration environment in order to inject things into the context in a working manner to match the running code configuration. Again I had been spoilt by expecting a framework specific automatically working test harness with default test customisations of the runtime code environment thrown in. Instead each integration test needed to use a wrapper that called a RunnerBeanConfig

@PropertySource("classpath:test.properties")
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(loader = AnnotationConfigContextLoader.class, classes = ITRunnerBeanConfig.class)
public class ModelFactoryServiceIT extends ServiceABC { ... }

The RunnerBeanConfig then needs to have all the database and resources configs annotated as Imports or it cannot find the different database sessions. Only after all that is it possible to inject the runtime command classes for testing.

@ComponentScan("uk.ac.bris.adt.erp.dataint")
@PropertySource("classpath:test.properties")
@Configuration
@Import({ DataIntDBConfig.class, MMDBConfig.class ... ResourceConfig.class })
public class ITRunnerBeanConfig implements EnvironmentAware {

  private Environment env;

  @Override
  public void setEnvironment(Environment environment) {
    this.env = environment;
  }
  ...
}

Conclusion

OK so I got it working in the end. However it all felt rather like I was doing something rather bespoke and complicated. This added a great deal of development time (or more accurately configuration wrangling time) to deliver components of persistence layer management and testing - that I had previously expected would all be available already in a mature high level framework. So in summary I probably have to conclude that if you needed to do this task in a more maintainable and standardised manner with a minimum of custom code. Don't use Spring and MyBatis. However if you already use one of those tools ... and you need to add this sort of functionality with them. Then it is certainly doable, and perhaps if you are a newbie to them, like me, this post may be of some use in speeding the job up for you :-}