How to upgrade your Aurora Serverless database schema using CDK and Lambda

16.1.2023 | 11 minutes of reading time

Imagine the following situation: You are building a serverless application using e.g. lambdas, you setup your system using CDK (or CloudFormation) and you store your data in Aurora Serverless. How would you automate your database schema adaptations or prefilling tables in your database? You don't know when your lambdas are going to get started. You don't know how many there may be. Maybe you have multiple services that connect to the same datastore and don't have a good fit where to include your upgrades?

Why not manage your database schema changes within your infrastructure setup with CDK? In this blog article I will describe how you can set up a CDK stack that handles database changes with Liquibase when you deploy your infrastructure changes using an example for you to try alongside in your own AWS account.

Setup/Preconditions

As mentioned above, the approach will use CDK. Therefore I recommend installing CDK on your machine alongside its prerequisites, if you want to follow along and deploy the example in your own AWS account. Also Docker will be used when deploying the stacks into your AWS account and therefore needs to be installed on your machine, if you want to follow along.

If you have not used CDK before, you should also make sure that the AWS account you want to use is bootstrapped and if it is not, run cdk bootstrap aws://ACCOUNT-NUMBER/REGION.

How do we do all this?

Architecture/Component Overview

The example is composed out of multiple stacks to separate the different components that are used.

We have an application stack that sets up our networking components, like a VPC, security groups and creates our database and database migration stack.
Next we have a stack to create and configure everything for our Aurora Serverless database.
Last but not least we have a stack that includes everything we need to do automatic database upgrades with Liquibase.

The picture below depicts the used constructs and stacks to give more of an overview over the solution.

As I will reference files directly in the following description I recommend checking out the source code of the example application, or at least to have a look on GitHub!

Networking infrastructure

The networking setup for this setup is nothing special and can be found in the application-stack.ts. However, it is needed as the Aurora Serverless will be deployed in a VPC and in most cases in the private subnets of said VPC. Therefore we need some configuration to enable our Lambda to talk to Aurora Serverless.

We use two separate security groups to configure access to the database and the Lambda and by adding an ingress rule we can allow our database migration security group to pass through our database security group on the default postgresql port 5432.

const databaseSecurityGroup = new SecurityGroup(
    this,
    "DatabaseSecurityGroup",
    {
        securityGroupName: "DatabaseSecurityGroup",
        vpc,
    }
);
const databaseMigrationSecurityGroup = new SecurityGroup(
    this,
    "DatabaseMigrationSecurityGroup",
    {
        securityGroupName: "DatabaseMigrationSecurityGroup",
        vpc,
    }
);
databaseSecurityGroup.addIngressRule(
    databaseMigrationSecurityGroup,
    Port.tcp(5432),
    "allow access for database migration lambda"
);

RDS Aurora Serverless

As for the networking components, the RDS Aurora Serverless setup is rather standard and can be found in the database-stack.ts. The database credentials are created when the database stack is created and the credential is safely stored in AWS SecretsManager for our database migration Lambda to access it later.

this.credentials = Credentials.fromGeneratedSecret(username, {
    secretName: "/aurora/databaseSecrets",
});

We also limit the subnets our database cluster can use to only our private subnets.

const databaseSubnetGroup = new SubnetGroup(this, "DatabaseSubnetGroup", {
    description: "SubnetGroup for Aurora Serverless",
    vpc: props.vpc,
    vpcSubnets: props.vpc.selectSubnets({
        subnetType: SubnetType.PRIVATE_WITH_EGRESS,
    }),
});

As of right now, only a few Postgresql versions are supported with Aurora Serverless, hence the Postgres engine version 11.16. I recommend checking for newer supported versions when you set up an Aurora Serverless cluster using Postgres.

this.database = new ServerlessCluster(this, "DemoCluster", {
    engine: DatabaseClusterEngine.auroraPostgres({
        version: AuroraPostgresEngineVersion.VER_11_16,
    }),
    credentials: this.credentials,
    defaultDatabaseName: "demo",
    vpc: props.vpc,
    subnetGroup: databaseSubnetGroup,
    securityGroups: [props.databaseSecurityGroup],
    scaling: {
        autoPause: Duration.minutes(5),
        minCapacity: AuroraCapacityUnit.ACU_2,
        maxCapacity: AuroraCapacityUnit.ACU_4,
    },
});

The scaling options are set to a minimum for this demo case and also include shutting down the database if there is no activity for 5 minutes. This can result in longer startup times.

Database migration with CDK Custom Resource

Now that the surrounding infrastructure is defined, we can take care of the infrastructure components we need for the database migration.

In total we will need two policy statements for the Lambda, one role that is used while executing the Lambda, the Lambda function itself, a custom resource provider and a custom resource.

As mentioned while setting up the Aurora Serverless, we store the credentials to access our database in AWS SecretsManager. Therefore we need to allow our Lambda to access the SecretsManager by creating the secretsManagerPolicyStatement. Also, as hinted while creating the network infrastructure, we need to allow our Lambda to run in a VPC. In AWS this is achieved through an elastic network interface (ENI) that the Lambda creates and connects to. The lambda can then access services that are located in private subnets of the VPC through the ENI that is being deployed in the VPC. This is allowed through the policy statement createENIPolicyStatement.

For the policies to take effect while executing the lambda, we create a role that extends the AWSLambdaBasicExecutionRole managed policy by adding the previously described policies.

Let's take a look at the lambda function. As the function is using Liquibase and is written in Java, we need to set the runtime to a currently supported Java runtime. In this case we opt for Java 11 which will be Amazon Corretto.

In order for the Lambda to fetch the secret from SecretsManager, we need to pass the region and secret name into the function. In this case we can use environment variables, as the secret name itself does not need special protection.

To further tell the lambda to connect to our VPC during execution time, we need to set the VPC, SecurityGroup and subnets to use by the Lambda. As our function has no need to be available to the general public, we can limit it to only use private subnets. Also, to allow the lambda to create and attach to ENIs we need to set our previously created role for the Lambda.

Building and bundling the Maven function for Lambda

Lambdas in CDK come with a nice feature. You can define a build process for your Lambda when you define the Lambda itself. When you specify where your code assets are located you have a set of options to apply also. One of those is a bundling option. This may be used to build your code and package it. For our use case this is quite handy, as we need to run a maven build and by specifying it in CDK directly, we do not need to add any other build and package systems.

AWS provides a set of predefined images you can use to build your software. The Java 11 building image contains Java and Maven and thereby everything we need to build and package our software. The Lambda function will simply get the code by putting the built and bundled artifact in the asset-output directory.

const databaseMigrationFunction = new Function(
    this,
    "DatabaseMigrationFunction",
    {
        runtime: Runtime.JAVA_11,
        code: Code.fromAsset(path.join(__dirname, "./database-migration"), {
            bundling: {
            image: Runtime.JAVA_11.bundlingImage,
            user: "root",
            outputType: BundlingOutput.ARCHIVED,
            command: [
                "/bin/sh",
                "-c",
                "mvn clean install " +
                "&& cp /asset-input/target/databaseMigration.jar /asset-output/",
            ],
            },
        }),
        handler: "migration.Handler",
        environment: {
            DATABASE_SECRET_NAME: props.credentials.secretName!,
            REGION: props.database.env.region,
        },
        vpc: props.vpc,
        vpcSubnets: props.vpc.selectSubnets({
            subnetType: SubnetType.PRIVATE_WITH_EGRESS,
        }),
        securityGroups: [props.databaseMigrationSecurityGroup],
        timeout: Duration.minutes(5),
        memorySize: 512,
        role: databaseMigrationFunctionRole,
    }
);

In order to run our database migration Lambda when deploying our infrastructure, we need to specify a CDK custom resource provider. By referencing our Lambda in the onEventHandler, our Lambda is called for every create, modify and delete event of the custom resource.

const databaseMigrationFunctionProvider = new Provider(
    this,
    "DatabaseMigrationResourceProvider",
    {
        onEventHandler: databaseMigrationFunction,
    }
);

Now that we have a custom resource provider, we can go ahead and specify the custom resource associated. If we would only specify the custom resource and point it to the custom resource provider service token, our Lambda would be executed once upon creation of the resource. Now this is a problem, if you want to have continuous database upgrades without destroying resources regularly. Therefore, we need the custom resource to change with every deployment so liquibase can check for necessary database upgrades.

An easy solution to achieve this is by adding a date property to the resource and setting the current date and time so we have an ever changing value.

new CustomResource(this, "DatabaseMigrationResource", {
    serviceToken: databaseMigrationFunctionProvider.serviceToken,
    properties: {
        date: new Date(Date.now()).toUTCString(),
    },
});

The database migration code

The database migration is done using Liquibase. As Liquibase requires Java to run we also write our Lambda in Java. AWS offers good documentation on how to write Lambda functions using Java so most of the general handler setup and dependencies can be found there. For our use case, we need to add secretsmanager, liquibase-core and postgresql to our Maven dependencies in the pom.xml.

Now as you may notice in our Handler.java class we do not implement the basic RequestHandler as a regular lambda would do. By extending the AbstractCustomResourceHandler we give our lambda a few methods to implement depending on what happens to our CDK resource. For this example we want to run the database migration when the cdk resource is created as well as when an update is happening. For a CDK resource deletion Liquibase is not executed – this however can be different depending on your use case!

For the migrateDatabasemethod to actually migrate something, we first need to fetch the secret from AWS SecretsManager in order to connect to the database. This way the secret is managed in a single safe place, access to the secret can be granted in a detailed manner and there is no secret information buried in environment variables.

The rest of the migrateDatabase method is kept rather simple but still creates a database connection, starts up a Liquibase context and runs the upgrades described in the Liquibase changelog.

The Liquibase changelog, located in src/main/resources/liquibase, just contains two changesets for the database schema creation and an example table so we can see that the migration happened and something has changed.

Deploy the Stacks

As we have taken a look at the components and some of the configuration we can deploy our CDK stacks. If you want to try this yourself, I encourage you to checkout the GitHub repo with all the necessary code. I've left out some more general stuff in this blog post, but you can check out the whole thing on Github and run it from there.

To deploy our three stacks to your locally configured AWS account, all you need to do is run cdk deploy --all.

Also good to know if you want to clean up afterwards: All you need to run is cdk destroy --all and optionally delete any remaining log groups in CloudWatch. (Destroying the stacks may take a while)

What happened in the AWS account/Where can I see results?

Let's assume you deployed the stack and get this nice message in your console:

 ✅  ServerlessRdsUpdatesStack/DatabaseMigrationStack

Now you might wonder, what happened? How can I see if the database migration was executed?

Let's check the AWS console to see if everything worked as it should have. So first of all log into the AWS account you deployed the stack to and navigate to the CloudFormation Service. There you should be able to find our three stacks in CREATE_COMPLETE status.

With this checked we can go ahead and see if our database migration ran and created the table we wanted. Therefore we need to open AWS SecretsManager and RDS.

In AWS SecretsManager we need to check the list of secrets for a secret named /aurora/databaseSecrets. When checking the details we need to take note of the ARN of this secret, because this is what we need to connect to our database in a minute. If you want to, you can also check username, password and other details that may be needed to connect to the database by clicking on the retrieve the secret value button.

In the RDS service we can open up the Query Editor and directly access a database using a secret from the SecretsManager. So let's open up the RDS Query Editor, choose the database we just created, choose to connect using a Secrets Manager ARN, and enter the ARN of the secret we just noted. Note if you used the demo, the database name will just be demo and afterwards you should be ready to connect to the database.

After connecting, there is a Query Editor that we can use to query our database. We can see that there is a query prefilled and we can use that to see if our Liquibase migration was executed. Therefore we can just hit the run button to execute the query. After the query is executed we can find our new users table in the result windows.

To now interact with the table you can feel free to select. insert or delete operation on the table demo.users using the Query Editor.

Conclusion

Automation of deployment and change processes is a very important topic for many projects. Moving faster and getting features out quicker and more reliable is pretty much inevitable. Due to this, automating every bit of your deployment is a necessity. Therefore, if you run into a situation where you are wondering how to automate upgrades of your relational database this solution might be something for you to consider. However, if you have no good reason to use a relational database, I encourage you to also have a look at schemaless options like DynamoDB.

Was this post helpful?

Likes

Blog author

Andreas Maier

Do you still have questions? Just send me a message.

fromAndreas Maier

Golang, Gin & MongoDB – Building microservices easily

Golang, a.k.a. Go, has been around in the industry for quite some time now, but people are still reluctant to just go ahead and use it. To help you get started, follow me on this journey and create your first microservice using Golang, Gin and Docker...

Cloud
Container
Go
Microservices
NoSQL

21.4.2020 | 9 Minuten Lesezeit

Andreas Maier

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Public Cloud im regulierten Sektor: Das ist zu beachten

Es war längere Zeit ein weit verbreitetes und in strategischen Debatten häufig zitiertes Missverständnis, dass die Bundesanstalt für Finanzdienstleistungsaufsicht (BaFin) dem Einsatz von Public-Cloud-Anbietern wie AWS, Azure und Co. einen Riegel vorschiebt...

Cloud
Compliance

10.4.2024 | 6 Minuten Lesezeit

Marc Bialowons

Björn Bohn

Green Cloud: Daten und Emissionen sparen

Das Internet produziert jährlich 900 Millionen Tonnen CO₂ – das ist deutlich mehr als Deutschland insgesamt emittiert. Hauptverantwortlich ist der immer weiter steigende Stromverbrauch beim Transport und der Speicherung von Daten. Wenn ihr kurz darüber...

Cloud
Green IT
Softwarearchitektur
Data

11.3.2024 | 5 Minuten Lesezeit

Dennis

AZ-900-Zertifizierung: Mein How-to!

Was ist AZ-900? Azure bietet eine Reihe verschiedener Zertifizierungen an. Zu finden sind sie hier. Darunter befindet sich auch die Zertifizierung AZ-900. Bei diesem Zertifikat handelt es sich um Microsoft Certified: Azure Fundamentals. Diese prüft unter...

Azure
Cloud

2.1.2024 | 5 Minuten Lesezeit

Ege Inanc

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

In der Welt der Cloud-Technologie und insbesondere bei AWS (Amazon Web Services) ist die effiziente Verwaltung von Ressourcen von entscheidender Bedeutung, um unnötige Kosten zu vermeiden. Dieser Blogbeitrag konzentriert sich auf AWS S3 und die teuren...

AWS
Cloud

27.11.2023 | 4 Minuten Lesezeit

Lukas Miliunas

Maximilian Mayer

Cloud FinOps

Cloud FinOps bietet einen etablierten Prozess, um Kosten für den Cloudbetrieb zu reduzieren (s. auch diesen Artikel). Zu diesem Zweck bietet es ein etabliertes Cloud-unabhängiges Vorgehen, das eine Organisation schrittweise aufgreifen kann. Das Tooling...

Cloud
Cloud Native
Green IT

26.10.2023 | 5 Minuten Lesezeit

Lukas Miliunas

Marco Paga

Mehr Struktur in der Cloud mit Azure Landing Zones

Die Migration in die Cloud bringt einige Herausforderungen mit sich. Viele Unternehmen stehen vor der Frage, wie ein effizienter und sicherer Aufbau einer skalierbaren Cloud-Infrastruktur umzusetzen ist. Die Antwort auf diese Herausforderung liegt in...

Cloud
Azure
IT-Governance

4.8.2023 | 4 Minuten Lesezeit

Florian Moll

Nils Bauroth

CI/CD-Pipelines mit AWS CDK CodePipeline

Das Aufsetzen der CI/CD-Pipeline ist ein typischer Task in der Anfangszeit eines Projekts. Ist die Pipeline dann aufgesetzt, sind Änderungen nur noch selten notwendig. Dementsprechend wenig Routine entwickeln Programmierende im Umgang mit der Konfiguration...

Cloud
CI/CD
AWS

17.7.2023 | 4 Minuten Lesezeit

Dennis

Green Cloud: Nachhaltig skalieren

Wenn Softwareprojekte in die Cloud gebracht werden, versprechen wir uns davon hohe Verfügbarkeit, planbare Kosten und eine immer dem Bedarf entsprechende Skalierung. Aufgrund der grenzenlosen Angebote ist es aber auch leicht, die Komponenten eines Systems...

Cloud
Softwarearchitektur
Green IT

12.6.2023 | 5 Minuten Lesezeit

Dennis

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Crossplane ist ein plattformübergreifendes Kontrollsystem (Control-Plane), das das Management von Cloud-Ressourcen vereinfachen und automatisieren soll. Das Tool ermöglicht es, verschiedene Cloud-Provider und lokale Ressourcen, z. B. Kubernetes-Cluster...

Cloud
Cloud Native

12.5.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Green Cloud: Ideen für eine nachhaltigere Architektur

Die ökologische Nachhaltigkeit eines Systems ist aktuell häufig noch kein Thema. Nachhaltigkeit bedeutet für mich in diesem Kontext die Reduktion der verursachten Emissionen durch gesenkten Ressourcenverbrauch – egal ob die Emissionen beim Cloudprovider...

Cloud
Softwarearchitektur
Green IT

5.5.2023 | 5 Minuten Lesezeit

Dennis

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

Ist die Cloud der große Umweltsünder?

Rechenleistung und Speicher kosten nicht nur Geld. Sie verbrauchen auch Mengen – potenziell klimaschädlicher – Energie. Das überrascht die Wenigsten, im kollektiven Bewusstsein ist es aber bislang kaum angekommen. Sehr wohl bewusst ist es natürlich den...

Cloud

18.1.2023 | 2 Minuten Lesezeit

Matthias Niehoff

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Infrastructure as Code (IaC) ist inzwischen ein alter Hut. Frameworks wie Terraform, Ansible und andere haben Standards geschaffen. Kaum jemand provisioniert produktive Systeme heute ohne IaC – sei es in der Cloud oder auf der eigenen Infrastruktur.Und...

Infrastructure as Code
AWS
Cloud

21.12.2022 | 3 Minuten Lesezeit

Matthias Niehoff

Infrastructure as Code in AWS: Keine Silver Bullet

TL;DR Es gibt keine Universalmethode. Infrastructure as Code ist ein vergleichsweise neuer Ansatz. Einige Lösungen rund um Infrastructure as Code befinden sich noch in der Entwicklung. Es gibt keinen klaren Favoriten. Die Wahl des passenden Tools hängt...

Cloud
AWS
Infrastructure as Code

13.12.2022 | 27 Minuten Lesezeit

Florian Wiech

Sören

AWS CloudFront Functions testen

Mit den CloudFront Functions bietet AWS die Möglichkeit, den Funktionsumfang von CloudFront um kleine JavaScript-Funktionen zu erweitern. AWS führt diese Funktionen direkt an den Edge-Locations aus und ermöglicht es dadurch, alle ankommenden Requests...

Cloud
AWS
Testing
Softwareentwicklung

4.10.2022 | 3 Minuten Lesezeit

Dennis

Die Zukunft der IDEs – aus Sicht eines „Java-EE-Entwicklers“

Bei unseren Kunden und auch bei codecentric dreht sich alles um den besten und schnellsten Weg, die richtige Software zu entwickeln – und das natürlich in hoher Qualität. Von daher bin ich auch ein fleißiger Leser des „State of DevOps“-Report (hier zum...

Cloud
Java
Remote Work

16.5.2022 | 11 Minuten Lesezeit

Rainer Vehns

Green Cloud: Emissionen unserer Cloud-Architektur messen

Überall wird von der Cloud geschwärmt: Grenzenlose Skalierung und unzählige Features sind bereits „out of the box“ verfügbar. Das alles gibt es zu unschlagbar günstigen Preisen. Das Thema Nachhaltigkeit kommt dabei selten zur Sprache. Rechenzentren verbrauchen...

AWS
Azure
Cloud
Google Cloud
Green IT

24.4.2022 | 6 Minuten Lesezeit

Dennis

Terraform Remote State richtig nutzen

Was ist Terraform und was ist State?Terraform ist ein Tool für die Verwaltung von Infrastruktur in Form von Code, gehört also in den sogenannten Infrastructure-as-Code-Bereich (IaC). Eine kurze Einführung und ein Vergleich zu anderen Tools findet sich...

Infrastructure
Softwarearchitektur
Cloud
DevOps

21.4.2022 | 7 Minuten Lesezeit

Alexander Kasper

Stream Processing mit Kafka Streams und Spring Boot

Kontinuierliche Datenströme in verteilten Systemen ohne Zeitverzögerung zu verarbeiten, birgt einige Herausforderungen. Wir zeigen euch, wie Stream Processing mit Kafka Streams und Spring Boot gelingen kann. Alles im Fluss: Betrachtet man Daten als fortlaufenden...

Softwarearchitektur
Cloud
IoT
Messaging
Kotlin
Spring

20.12.2021 | 20 Minuten Lesezeit

Maik Fleuter

Lukas Maier

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Machine Learning (ML) erzeugt erst dann realen Mehrwert, wenn es in Produktion benutzt wird. Allerdings kann die Zeitspanne zwischen der Entwicklung eines belastbaren Modells und dessen Einsatz frustrierend lange sein. Insbesondere in schnelllebigen ...

Agile Methoden
Cloud
Machine Learning

26.7.2021 | 5 Minuten Lesezeit

Timo Böhm

Niklas Haas

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

How to upgrade your Aurora Serverless database schema using CDK and Lambda

Setup/Preconditions

How do we do all this?

Architecture/Component Overview

Networking infrastructure

RDS Aurora Serverless

Database migration with CDK Custom Resource

Building and bundling the Maven function for Lambda

The database migration code

Deploy the Stacks

What happened in the AWS account/Where can I see results?

Conclusion

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Golang, Gin & MongoDB – Building microservices easily

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Public Cloud im regulierten Sektor: Das ist zu beachten

Green Cloud: Daten und Emissionen sparen

AZ-900-Zertifizierung: Mein How-to!

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

Cloud FinOps

Mehr Struktur in der Cloud mit Azure Landing Zones

CI/CD-Pipelines mit AWS CDK CodePipeline

Green Cloud: Nachhaltig skalieren

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Green Cloud: Ideen für eine nachhaltigere Architektur

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Ist die Cloud der große Umweltsünder?

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Infrastructure as Code in AWS: Keine Silver Bullet

AWS CloudFront Functions testen

Die Zukunft der IDEs – aus Sicht eines „Java-EE-Entwicklers“

Green Cloud: Emissionen unserer Cloud-Architektur messen

Terraform Remote State richtig nutzen

Stream Processing mit Kafka Streams und Spring Boot

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten