//

How to upgrade your Aurora Serverless Database Schema using CDK and Lambda

16.1.2023 | 11 minutes of reading time

Imagine the following situation: You are building a serverless application using e.g. lambdas, you setup your system using CDK (or CloudFormation) and you store your data in Aurora Serverless. How would you automate your database schema adaptations or prefilling tables in your database? You don't know when your lambdas are going to get started. You don't know how many there may be. Maybe you have multiple services that connect to the same datastore and don't have a good fit where to include your upgrades?

Why not manage your database schema changes within your infrastructure setup with CDK? In this blog article I will describe how you can set up a CDK stack that handles database changes with Liquibase when you deploy your infrastructure changes using an example for you to try alongside in your own AWS account.

Setup/Preconditions

As mentioned above, the approach will use CDK. Therefore I recommend installing CDK on your machine alongside its prerequisites, if you want to follow along and deploy the example in your own AWS account. Also Docker will be used when deploying the stacks into your AWS account and therefore needs to be installed on your machine, if you want to follow along.

If you have not used CDK before, you should also make sure that the AWS account you want to use is bootstrapped and if it is not, run cdk bootstrap aws://ACCOUNT-NUMBER/REGION.

How do we do all this?

Architecture/Component Overview

The example is composed out of multiple stacks to separate the different components that are used.

  1. We have an application stack that sets up our networking components, like a VPC, security groups and creates our database and database migration stack.
  2. Next we have a stack to create and configure everything for our Aurora Serverless database.
  3. Last but not least we have a stack that includes everything we need to do automatic database upgrades with Liquibase.

The picture below depicts the used constructs and stacks to give more of an overview over the solution.

As I will reference files directly in the following description I recommend checking out the source code of the example application, or at least to have a look on GitHub!

Networking infrastructure

The networking setup for this setup is nothing special and can be found in the application-stack.ts. However, it is needed as the Aurora Serverless will be deployed in a VPC and in most cases in the private subnets of said VPC. Therefore we need some configuration to enable our Lambda to talk to Aurora Serverless.

We use two separate security groups to configure access to the database and the Lambda and by adding an ingress rule we can allow our database migration security group to pass through our database security group on the default postgresql port 5432.

const databaseSecurityGroup = new SecurityGroup(
    this,
    "DatabaseSecurityGroup",
    {
        securityGroupName: "DatabaseSecurityGroup",
        vpc,
    }
);
const databaseMigrationSecurityGroup = new SecurityGroup(
    this,
    "DatabaseMigrationSecurityGroup",
    {
        securityGroupName: "DatabaseMigrationSecurityGroup",
        vpc,
    }
);
databaseSecurityGroup.addIngressRule(
    databaseMigrationSecurityGroup,
    Port.tcp(5432),
    "allow access for database migration lambda"
);

RDS Aurora Serverless

As for the networking components, the RDS Aurora Serverless setup is rather standard and can be found in the database-stack.ts. The database credentials are created when the database stack is created and the credential is safely stored in AWS SecretsManager for our database migration Lambda to access it later.

this.credentials = Credentials.fromGeneratedSecret(username, {
    secretName: "/aurora/databaseSecrets",
});

We also limit the subnets our database cluster can use to only our private subnets.

const databaseSubnetGroup = new SubnetGroup(this, "DatabaseSubnetGroup", {
    description: "SubnetGroup for Aurora Serverless",
    vpc: props.vpc,
    vpcSubnets: props.vpc.selectSubnets({
        subnetType: SubnetType.PRIVATE_WITH_EGRESS,
    }),
});

As of right now, only a few Postgresql versions are supported with Aurora Serverless, hence the Postgres engine version 11.16. I recommend checking for newer supported versions when you set up an Aurora Serverless cluster using Postgres.

this.database = new ServerlessCluster(this, "DemoCluster", {
    engine: DatabaseClusterEngine.auroraPostgres({
        version: AuroraPostgresEngineVersion.VER_11_16,
    }),
    credentials: this.credentials,
    defaultDatabaseName: "demo",
    vpc: props.vpc,
    subnetGroup: databaseSubnetGroup,
    securityGroups: [props.databaseSecurityGroup],
    scaling: {
        autoPause: Duration.minutes(5),
        minCapacity: AuroraCapacityUnit.ACU_2,
        maxCapacity: AuroraCapacityUnit.ACU_4,
    },
});

The scaling options are set to a minimum for this demo case and also include shutting down the database if there is no activity for 5 minutes. This can result in longer startup times.

Database migration with CDK Custom Resource

Now that the surrounding infrastructure is defined, we can take care of the infrastructure components we need for the database migration.

In total we will need two policy statements for the Lambda, one role that is used while executing the Lambda, the Lambda function itself, a custom resource provider and a custom resource.

As mentioned while setting up the Aurora Serverless, we store the credentials to access our database in AWS SecretsManager. Therefore we need to allow our Lambda to access the SecretsManager by creating the secretsManagerPolicyStatement. Also, as hinted while creating the network infrastructure, we need to allow our Lambda to run in a VPC. In AWS this is achieved through an elastic network interface (ENI) that the Lambda creates and connects to. The lambda can then access services that are located in private subnets of the VPC through the ENI that is being deployed in the VPC. This is allowed through the policy statement createENIPolicyStatement.

For the policies to take effect while executing the lambda, we create a role that extends the AWSLambdaBasicExecutionRole managed policy by adding the previously described policies.

Let's take a look at the lambda function. As the function is using Liquibase and is written in Java, we need to set the runtime to a currently supported Java runtime. In this case we opt for Java 11 which will be Amazon Corretto.

In order for the Lambda to fetch the secret from SecretsManager, we need to pass the region and secret name into the function. In this case we can use environment variables, as the secret name itself does not need special protection.

To further tell the lambda to connect to our VPC during execution time, we need to set the VPC, SecurityGroup and subnets to use by the Lambda. As our function has no need to be available to the general public, we can limit it to only use private subnets. Also, to allow the lambda to create and attach to ENIs we need to set our previously created role for the Lambda.

Building and bundling the Maven function for Lambda

Lambdas in CDK come with a nice feature. You can define a build process for your Lambda when you define the Lambda itself. When you specify where your code assets are located you have a set of options to apply also. One of those is a bundling option. This may be used to build your code and package it. For our use case this is quite handy, as we need to run a maven build and by specifying it in CDK directly, we do not need to add any other build and package systems.

AWS provides a set of predefined images you can use to build your software. The Java 11 building image contains Java and Maven and thereby everything we need to build and package our software. The Lambda function will simply get the code by putting the built and bundled artifact in the asset-output directory.

const databaseMigrationFunction = new Function(
    this,
    "DatabaseMigrationFunction",
    {
        runtime: Runtime.JAVA_11,
        code: Code.fromAsset(path.join(__dirname, "./database-migration"), {
            bundling: {
            image: Runtime.JAVA_11.bundlingImage,
            user: "root",
            outputType: BundlingOutput.ARCHIVED,
            command: [
                "/bin/sh",
                "-c",
                "mvn clean install " +
                "&& cp /asset-input/target/databaseMigration.jar /asset-output/",
            ],
            },
        }),
        handler: "migration.Handler",
        environment: {
            DATABASE_SECRET_NAME: props.credentials.secretName!,
            REGION: props.database.env.region,
        },
        vpc: props.vpc,
        vpcSubnets: props.vpc.selectSubnets({
            subnetType: SubnetType.PRIVATE_WITH_EGRESS,
        }),
        securityGroups: [props.databaseMigrationSecurityGroup],
        timeout: Duration.minutes(5),
        memorySize: 512,
        role: databaseMigrationFunctionRole,
    }
);

In order to run our database migration Lambda when deploying our infrastructure, we need to specify a CDK custom resource provider. By referencing our Lambda in the onEventHandler, our Lambda is called for every create, modify and delete event of the custom resource.

const databaseMigrationFunctionProvider = new Provider(
    this,
    "DatabaseMigrationResourceProvider",
    {
        onEventHandler: databaseMigrationFunction,
    }
);

Now that we have a custom resource provider, we can go ahead and specify the custom resource associated. If we would only specify the custom resource and point it to the custom resource provider service token, our Lambda would be executed once upon creation of the resource. Now this is a problem, if you want to have continuous database upgrades without destroying resources regularly. Therefore, we need the custom resource to change with every deployment so liquibase can check for necessary database upgrades.

An easy solution to achieve this is by adding a date property to the resource and setting the current date and time so we have an ever changing value.

new CustomResource(this, "DatabaseMigrationResource", {
    serviceToken: databaseMigrationFunctionProvider.serviceToken,
    properties: {
        date: new Date(Date.now()).toUTCString(),
    },
});

The database migration code

The database migration is done using Liquibase. As Liquibase requires Java to run we also write our Lambda in Java. AWS offers good documentation on how to write Lambda functions using Java so most of the general handler setup and dependencies can be found there. For our use case, we need to add secretsmanager, liquibase-core and postgresql to our Maven dependencies in the pom.xml.

Now as you may notice in our Handler.java class we do not implement the basic RequestHandler as a regular lambda would do. By extending the AbstractCustomResourceHandler we give our lambda a few methods to implement depending on what happens to our CDK resource. For this example we want to run the database migration when the cdk resource is created as well as when an update is happening. For a CDK resource deletion Liquibase is not executed – this however can be different depending on your use case!

For the migrateDatabasemethod to actually migrate something, we first need to fetch the secret from AWS SecretsManager in order to connect to the database. This way the secret is managed in a single safe place, access to the secret can be granted in a detailed manner and there is no secret information buried in environment variables.

The rest of the migrateDatabase method is kept rather simple but still creates a database connection, starts up a Liquibase context and runs the upgrades described in the Liquibase changelog.

The Liquibase changelog, located in src/main/resources/liquibase, just contains two changesets for the database schema creation and an example table so we can see that the migration happened and something has changed.

Deploy the Stacks

As we have taken a look at the components and some of the configuration we can deploy our CDK stacks. If you want to try this yourself, I encourage you to checkout the GitHub repo with all the necessary code. I've left out some more general stuff in this blog post, but you can check out the whole thing on Github and run it from there.

To deploy our three stacks to your locally configured AWS account, all you need to do is run cdk deploy --all.

Also good to know if you want to clean up afterwards: All you need to run is cdk destroy --all and optionally delete any remaining log groups in CloudWatch. (Destroying the stacks may take a while)

What happened in the AWS account/Where can I see results?

Let's assume you deployed the stack and get this nice message in your console:

 ✅  ServerlessRdsUpdatesStack/DatabaseMigrationStack

Now you might wonder, what happened? How can I see if the database migration was executed?

Let's check the AWS console to see if everything worked as it should have. So first of all log into the AWS account you deployed the stack to and navigate to the CloudFormation Service. There you should be able to find our three stacks in CREATE_COMPLETE status.

With this checked we can go ahead and see if our database migration ran and created the table we wanted. Therefore we need to open AWS SecretsManager and RDS.

In AWS SecretsManager we need to check the list of secrets for a secret named /aurora/databaseSecrets. When checking the details we need to take note of the ARN of this secret, because this is what we need to connect to our database in a minute. If you want to, you can also check username, password and other details that may be needed to connect to the database by clicking on the retrieve the secret value button.

In the RDS service we can open up the Query Editor and directly access a database using a secret from the SecretsManager. So let's open up the RDS Query Editor, choose the database we just created, choose to connect using a Secrets Manager ARN, and enter the ARN of the secret we just noted. Note if you used the demo, the database name will just be demo and afterwards you should be ready to connect to the database.

After connecting, there is a Query Editor that we can use to query our database. We can see that there is a query prefilled and we can use that to see if our Liquibase migration was executed. Therefore we can just hit the run button to execute the query. After the query is executed we can find our new users table in the result windows.

To now interact with the table you can feel free to select. insert or delete operation on the table demo.users using the Query Editor.

Conclusion

Automation of deployment and change processes is a very important topic for many projects. Moving faster and getting features out quicker and more reliable is pretty much inevitable. Due to this, automating every bit of your deployment is a necessity. Therefore, if you run into a situation where you are wondering how to automate upgrades of your relational database this solution might be something for you to consider. However, if you have no good reason to use a relational database, I encourage you to also have a look at schemaless options like DynamoDB.

share post

Likes

1

//

More articles in this subject area\n

Discover exciting further topics and let the codecentric world inspire you.

//

Gemeinsam bessere Projekte umsetzen

Wir helfen Deinem Unternehmen

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.