Many software projects are still using relational databases as an important part of their technology stack. This typically requires the handling of database migrations, also often called schema migrations. Reasons to perform migrations on the database are manifold. A few examples:
- New features require new sets of database tables, views and indexes.
- Bugfixes require changes to existing database objects.
- Performance problems require new indexes for certain database tables.
The problem – or let’s better call it challenge – is to ensure that a certain software release is always delivered with the matching state of the database. Especially if there are several installations of an application this requires:
- Supporting new installations of a software release from scratch and
- upgrading from any existing release in the field to the latest one.
In addition (update) installations for different releases on different test stages must be handled as well and are potentially imposing requirements to our way of handling database migrations.
All this can of course be “solved” using a bunch of shell- and sql-scripts. But that is not really a sustainable solution as these scripts tend to become overly complex and hard to maintain quite quickly.
Probably by now it is not hard to guess what is the better alternative to this :).
Flyway is implemented in Java and the installation is really straightforward. Download the latest package from the Flyway homepage and unpack it to a proper location on the target machine. Make sure to either create a soft-link to the
flyway-executable or add the installation directory to your
PATH. Keep in mind that the installation is required on developer machines as well as on all servers where the application – or more precise the database of your application – is running. Typically this means test and productive environments.
When talking about tools for schema migrations there are typically two contenders mentioned: Flyway and Liquibase. While we are discussing Flyway in this post you might also want to take a look at our article on Liquibase here.
On a Mac however you can also use Homebrew to install Flyway. This is more convenient and will create the required softlink to the
/usr/local/bin/ right away.
brew install flyway
As Flyway has been already installed on my machine – due to some earlier evaluation – I used the opportunity to upgrade to the latest release 4.0.3.
Thomass-MacBook-cc:bin thomasjaspers$ brew upgrade flyway ==> Upgrading 1 outdated package, with result: flyway 4.0.3 ==> Upgrading flyway ==> Using the sandbox ==> Downloading https://search.maven.org/remotecontent?filepath=org/flywaydb/flyway-commandline/4.0.3/flyway-commandline-4.0.3.tar.gz ######################################################################## 100,0% 🍺 /usr/local/Cellar/flyway/4.0.3: 21 files, 12.0M, built in 8 seconds
We are good to go if we can execute
flyway from anywhere on the command-line. This should print out a quite extensive usage message showing that the command has been found and was executed successfully.
Database changes in Flyway are bundled in so called Migrations. In this blog post we will only consider Migrations that are written as plain SQL. There are more advanced concepts like writing Migrations in Java, but those we will be left for discussion in some forthcoming blog post. For all Migrations there is one fundamental concept in Flyway:
A Migration must no change once it has been applied to the database!
The reason for this is simple: If this would not be the case one cannot be sure that two installations – with the “same” set of Migrations – are really resulting in the same database state. The result would be chaos and no one likes chaos!
Flyway is enforcing this by keeping track of a checksum for each Migration that has been executed (together with other information). This checksum is stored in the database in the
schema_version table. On first execution this table is created automatically by Flyway in the same schema as the other database objects created with Flyway for that project. Thus there are potentially multiple instances of this table, one per schema that is under control by a Flyway project.
In case the checksum for an already installed Migration has changed Flyway would acknowledge this with an error and stops the installation.
Thomass-MacBook-cc:db_flyway_sample thomasjaspers$ flyway migrate Flyway 4.0.3 by Boxfuse Database: jdbc:postgresql://localhost:5432/flywaydemo (PostgreSQL 9.4) ERROR: Validate failed: Migration checksum mismatch for migration 2.1 -> Applied to database : -122752047 -> Resolved locally : -609203476
Of course there are ways around this by manipulating checksum values in the
schema_version-table or by applying certain configuration values to Flyway. But obviously this should only be the last resort in case some really bad things have happened. Therefore we will take a closer look at how to avoid bad things from happening later on in this blog post when discussing best practices.
For the time being let’s take a look at an example of a Flyway project.
Flyway Sample Project
This is a simple example of a database migration file that is named
CREATE TABLE flyway_test ( key VARCHAR(64), value VARCHAR(255), PRIMARY KEY(key) ); ALTER TABLE flyway_test OWNER TO flywaydemo;
Well, one might think “Hey, this looks exactly like a plain SQL file”. Well the reason for this probably is that this is a plain SQL file. And this is part of the beauty of the whole approach. There is no need to learn anything new here beside a bit of configuration and execution of the tool. The SQL statements from the Migration-files can easily be tested by executing them directly in any SQL tool or shell.
Now how does Flyway know which SQL-files aka Migrations must be executed? And how does it know which database to connect to? You could do all this by giving a huge amount of command line parameters. But for sure the better solution is using the
flyway.conf configuration file for this.
flyway.driver=org.postgresql.Driver flyway.url=jdbc:postgresql://localhost:5432/flywaydemo flyway.user=flywaydemo flyway.password=flywaydemo flyway.locations=filesystem:src/main/resources/flyway/migrations flyway.sqlMigrationPrefix=V flyway.sqlMigrationSeparator=__ flyway.sqlMigrationSuffix=.sql flyway.validateOnMigrate=true
This configuration file should be stored to the project directory of the database migration project. Then Flyway is executed within that directory and also all configuration entries made are relative to that directory. This way the project can be executed easily on different environments. It would be possible to define a list of directories for the
flyway.locations. But I think it is better to use sub-directories per release. This can also be seen from the sample project. The directory used might remind you of the Maven directory structure. That is right :-). There is a blog post of its own showing how to execute Flyway from Maven. The directory structure used in the example is a preparation for this.
The complete sample project shown in this blog post can also be found on GitHub. It is using PosgreSQL as a database, but that can be easily changed to almost any other relational database.
The first few entries are describing the database and how to access it. Note: Flyway comes with a set of pre-installed database JDBC-drivers. PostgreSQL is one of them and therefore no additional action is required here. Otherwise you would need to add the proper driver-JAR to the
libexec/drivers-directory of your Flyway installation.
The next entries are defining where to find Migration-files and some naming conventions for them. Setting the
flyway.validateOnMigrate to false would disable the validation whether or not existing Migrations have been changed. In any real project this should always be set to
true. A comprehensive list of available configuration parameters – with their descriptions – can be found from here.
The Flyway configuration approach is two-tiered. Global settings can be configured in the
flyway.conf configuration file in the Flyway installation directory. Additional configuration entries – or overwriting some of the global ones – can then be done in the project-specific
flyway.conf configuration file.
Then we can just execute Flyway from the project directory by executing
flyway migrate as follows.
ThomassacBookcc:db_flyway_sample thomasjaspers$ flyway migrate Flyway 4.0.3 by Boxfuse Database: jdbc:postgresql://localhost:5432/flywaydemo (PostgreSQL 9.4) Successfully validated 3 migrations (execution time 00:00.017s) Creating Metadata table: "public"."schema_version" Current version of schema "public": << Empty Schema >> Migrating schema "public" to version 1.1 - create test table Migrating schema "public" to version 1.2 - create test view Migrating schema "public" to version 2.1 - create test index Successfully applied 3 migrations to schema "public" (execution time 00:00.066s).
The log-messages are helpful to check installations. It is printed out whether a Migration was already applied to the system or is newly applied. In the above example all Migrations are new. Simply re-executing the same command shows the following output.
ThomassacBookcc:db_flyway_sample thomasjaspers$ flyway migrate Flyway 4.0.3 by Boxfuse Database: jdbc:postgresql://localhost:5432/flywaydemo (PostgreSQL 9.4) Successfully validated 3 migrations (execution time 00:00.018s) Current version of schema "public": 2.1 Schema "public" is up to date. No migration necessary.
That is basically it, besides the need to execute the Migrations in the proper order. This is achieved by the version prefix of each file that is separated from the rest of the filename by the
sqlMigrationSeparator that is defined in the
flyway.conf configuration file. In our example two underscores are used. Having the following three files – distributed over two directories – they would thus be executed in this order:
V1_1__create_test_table.sql V1_2__create_test_view.sql V2_1__create_test_index.sql
This could basically also be seen from the output of the flyway-execution above. Using the sample-project from GitHub as a blueprint this is hopefully enough to get started with Flyway. Nevertheless there are some best practices that should be considered to use the tool most efficiently.
Migrations must be kept stable already during development time
From my own experience I can tell that this is a great topic for lots of discussions inside the development team :-). Of course one can argue that Migrations must only be stable – read “unchanged” – for productive releases. Thus new Migrations added for a not yet delivered release could still be changed. In theory this is true, but going down that road a lot of time will be spent to fix broken installations on development and test environments. The reason behind this is that data is often setup locally or on some test environment to test certain scenarios. In such cases an upgrade installation is definitely preferred over a re-installation where all manually added/changed data on that environment will be lost. Such an upgrade installation is not (easily) possible if there are “broken” Migrations.
Thus if there is a bug in a Migration like wrong definition of a datatype, a wrong name or things like that: Simply write an additional Migration to fix this and do not change already committed ones.
Migrations must be grouped in a meaningful way
Basically there are two possibilities to achieve this:
- Grouping the files by release.
- Grouping the files by feature.
The concept is in both cases the same, thus grouping the files together in release- or feature-specific sub-directories of the directories scanned for Migrations. What must be always kept in mind is the fact that the execution order is determined by the prefix. This is quite easy when grouping by release, but can become really hard when this is done by feature. The reason for this is that features might be developed in parallel that are related to the same set of database objects. In that case the required order might be mixed up. Therefore it might be better to develop the database migrations per release even if the features are developed independently in branches.
Conclusion and Outlook
Flyway is a great tool to handle database migrations. With tools like Flyway – or likewise Liquibase – there is no excuse to still mess around with some custom-made “solutions” to handle database migrations. The tool keeping track on what has already been executed on a system and what needs to be executed still is extremely helpful. This comes along with some nice logging during the installation and more features that will be part of some forthcoming blog posts on Flyway. This includes integration to Maven and writing complex Migrations in Java. Stay tuned :-).