RDS database migration with Lambda

No Comments

When I was building Java-based web applications we had some handy tools like Flyway and Liquibase for database schema migration. Nowadays I am using Lambda for quite some projects, and recently I had to use RDS (MySQL, as well as Aurora MySQL). The Lambda execution model is of course really different compared to the long-running web applications in Spring. I tried a naive approach to copy the ideology of schema-checking at boot, migrating whenever the DB is behind (in schema version), or just continue basic flow, however this philosophy has some downsides:

  • The schema checking will get expensive! It takes at least an additional 100ms (rounded up) when you connect, perform a version check select-statement. When called thousands or millions of times, this is really costly!
  • Lambda’s basic operating model is that of many concurrent executions. Which makes it difficult to plan for a migration. Which execution should be the migration? What about concurrently running executions?

I created a version for performing MySQL migrations which can be found on Github. It has support for incremental migrations based on files named by using semantic versioning. I will use it in the projects involving RDS MySQL and evolve it in time. Next I will discuss the rationale and some of the design details.

Reasons for choosing Lambda for schema migration execution

While it might not be a typical pick, Lambdas have some upsides as well:

  • To achieve a homogeneous application landscape
  • To have near zero infrastructure costs for executing migrations
  • It offers an isolated – easy to secure – execution environment

SQL migration with Lambda as executor

A database schema typically supports a business domain, for which multiple Lambdas are accessing it to perform their work. Having many Lambdas accessing your schema can cause a coupling on your deployments (meaning, you need to deploy multiple Lambdas at once when you make a schema migration). Ideally you don’t want to lower this coupling. There are many frameworks and best practices about low-impact schema migrations and defensive programming approaches for your data layer, but that’s beyond the scope of this article. In practice, it’s rarely a single Lambda that accesses your datastore. So ideally you want to have version awareness across Lambdas on code level, as well as runtime level.

With Lambda’s obvious constraints in execution time and memory it still qualifies for most migration situations. I rarely surpassed even a 5 minute window in Flyway, so 15 minutes is plenty for the majority of situations. Memory-wise 3Gb will prove to be enough for your average use case. I would recommend to keep all the heavy lifting in the database in simple setups. So the data that is under migration should never be queried to the Lambda. If migrations are really difficult, I would recommend to use temporary tables to restructure the data. Anything beyond this scope would require a different service like AWS data migration.

The MySQL Lambda migrator

While still at its infancy, I released an early version of this tool on Github. It supports the following features:

  • Incremental migrations based on .sql files in the migrations folder, with file named based on semver.
  • Creates an initial version meta-table (called db_version)
  • Executes forward migrations in sequence
  • Is version aware (through db_version) and only executes subsequent updates
  • Is able to execute a single file again by providing a single version string.
  • It pushes queries with semicolon (;) separation in one query.
  • Has support for database connection parameters

Forward migrations

I tried it out in two projects and am happy with the experience. A first requirement is that your projects need to be able to support a forward migration strategy. Your code needs to implement a data-access layer which is version-aware, or at a minimum as tolerant as possible for changes in property names, data types, ordering, and missing or added properties.

The primary goal for the Github repo is to clone it and adjust to your needs. You will probably want to change it. It might be nice to create an NPM package at some point. Feel free to reach out on Twitter or some other medium it you want to collaborate. Check out more of my blog posts:
Improving the Lambda developer experience
Will AWS CDK replace Terraform and the Serverless Framework?
Use Serverless AWS step functions to reduce VPC costs

Kevin van Ingen

Kevin has a background in software engineering, economics and information science. After working as a development freelancer and teacher he returned to apply a broad multidisciplinary perspective to development projects.

Comment

Your email address will not be published. Required fields are marked *