AWS CodeBuild: First Contact with Monorepo

No Comments

Have you ever tried to integrate a GitHub hosted monorepo with the AWS build services like CodePipeline and CodeBuild? We did, and we failed. In this blog post, we describe the challenges you might have with pipelining your monorepo. Even better, we also share our solution as Terraform scripted Code as Infrastructure. Feel free to give feedback, if you find a better solution!

Working in a Monorepo

In our current project, we are using a large GitHub hosted monorepo. This repository contains multiple applications that have to be tested and deployed, separately. Several teams are developing within this GitHub repository, being responsible for their own applications. Due to this, we see many pushes during a workday. Those have to be filtered out so that other teams building pipelines don’t get triggered by those pushes.

Source Code Hosting and Deployment

GitHub is one of the most used hosting service for version control using Git. Many development teams face the challenge of integrating it with build services on different cloud providers, like Google Cloud or Amazon Web Services (AWS). As we are deploying our application on AWS, we think it’s a good idea to also leverage the AWS services for Continuous Integration/Continuous Deployment (CI/CD). So, we started to use AWS CodeBuild projects for building our React Single-Page Application, building our Spring Boot-based backend, executing deployments, integration testing, and message localization. We can leverage AWS IAM roles and permissions for our deployment and don’t need technical users accessing our AWS account.

AWS CodePipeline is the service that orchestrates the different build steps. It provides a limited number of mechanisms to pull the source code and trigger a build process. Those include GitHub webhooks, AWS CodeCommit, S3 buckets, and AWS ECR (the AWS Docker image repository).

Triggering the Build Selectively

As we mentioned before, our AWS build pipeline got triggered about 20 times a day without pushing changes to source code owned by our team. The main disadvantages were a waste of paid computing resources and – causing most of the pain – a longer waiting time between one of our pushes and a deployment. Having to face this challenge, we decided to find a better way to trigger our CI/CD pipeline.

CodeBuild as a Trigger for CodePipeline

Evaluating our possibilities, we found two reasonable candidates – CodePipeline source stage or a single CodeBuild project. On first sight, we thought that the GitHub webhook used by the CodePipeline source stage would do the trick for us. However, this was not the case.

Looking into the source stage configuration, we found that filtering PUSH events was only possible, when selecting a certain branch. One could argue that we simply could use different branches for each team, but we are doing Trunk-Based Development instead of GitFlow. So, this was a no-go for us.

As a result, we took a closer look at CodeBuild project to trigger our builds.

Get Everything out of CodeBuild Projects

AWS CodeBuild projects support GitHub webhooks for private repositories in your GitHub account. This means, you can use an access token to link a CodeBuild project to your repository. Every codechange that is pushed to the repository rebuilds the source code.

Connect CodeBuild project to GitHub with personal access token

Connect CodeBuild project to GitHub with personal access token

Doing so, you can also configure webhook event filter groups reacting on events like PUSH or PULL_REQUEST creation, update & reopening. We can now filter on branch and file path (furthermore, actor filtering is also possible).

Configure CodeBuild project webhook filtering

Configure CodeBuild project webhook filtering

Now, we had our push event filtering issue solved. We moved on to integrate CodeBuild as a trigger for our build pipeline. Since we had our build pipeline using CodePipeline in place already, we simply had to switch the source stage to work with our filtering CodeBuild project. In order to do so, we set a S3 bucket as source for our CodePipeline. The build process starts every time a file identified by an S3 object key changes. So, we have implemented a buildspec (a yaml file describing the action of a build step in a CodeBuild project) to push our source code to an encrypted S3 bucket, triggering the build process.

To optimize the monorepo setup, we took advantage of the possibility to push just the source code owned by our team further on to the pipeline. We zip only the folders that contain relevant source code. In our case, that meant to decrease the file size of code pushed between stages significantly. Including the .git directory in the pulled source code enabled us to use git commands (like git log) within the buildspec.

For better visualization, we included an excerpt of our architecture diagram below.

Build Pipeline architecture with CodeBuild projects

Build pipeline architecture with CodeBuild projects

 

Producing Infrastructure as Code

Following good practices for developing cloud hosted applications, we implemented our build pipeline configuration as Infrastructure as Code. We found Terraform to be a suitable tool for doing this. Developing Infrastructure as Code helps to speed up and reduce human errors, since we don’t have to do configurations manually and are able to automate infrastructure deployments in different environments.

Not There Yet

Terraform offers implementations of AWS CodeBuild projects (aws_codebuild_project) as well as CodeBuild webhooks (aws_codebuild_webhook). However, the webhook filter group events (see Get everything out of  CodeBuild Projects) are not yet implemented by the folks from Hashicorp (there is an open issue). To deal with this, we call the AWS command line interface using a Terraform null_resource. With the help of the AWS CLI call we are able to update our webhook filter group events.

Enough with the talking – let’s see some code. We built a reusable Terraform module that creates all the resources needed. For the execution of the code you need a configured AWS CLI as well as a Terraform CLI.

This module creates and updates a webhook for the given CodeBuild project myCodeBuildProject. For each of the elements in the file_path_placeholders list, a different event filter group JSON snippet is created. It is added to a final JSON which updates the webhook through the AWS CLI. You can find the complete sample module code on GitHub. Please be aware that we did not include all possible filter event types. If you want to use the sample code, feel free to extend it according to your needs.

Once the myCodeBuildProject CodeBuild project get’s triggered and has access to the source code, the following buildspec executes. It’s responsible for uploading the source code to the S3 bucket which is linked to the CodePipeline.

The configuration of the CodeBuild project makes the environment variables available. They identify the S3 bucket and the object key of the zip file.

Think Further, Become Faster

Our implementation of a build trigger with filtering functionality opens up further possibilities of improving your CI/CD pipeline.

In addition to the missing build trigger functionality, we faced the issues that our build time was continuously increasing. The download of a huge number of dependencies during the execution of a `yarn install` caused our frontend build to become very lengthy. Therefore, we created a docker image with pre-installed dependencies. Whenever the package.json file changes, we use a specific build trigger that starts a rebuild of a docker image with updated dependencies.

We are now packaging and uploading the source code to S3 from within the CodeBuild project that triggers the build pipeline. So, we are able to package only specific parts of the source code. Furthermore, the versioning functionality for files hosted on a S3 bucket allows us to restart the build with a specific object version. We no longer have to search for the commit ID in the GitHub repository.

Wrap-up

Our specific issue is related to working with a monorepo source code hosted on GitHub and deploying using AWS CI/CD services. We showed you how to handle that issue. The CI/CD integration of AWS has some great features, but also lacks an advanced integration with external hosting services for version control. With the help of a CodeBuild project, we showed you how to handle triggering for build pipelines. For the sake of speed and reproducibility, we implemented the required infrastructure as code and showed you how to deal with Terraform limitations.

We hope that this article can be of help for you and we wish you happy coding!

Marco Berger

Marco is working as a Software Developer and IT-Consultant for the codecentric AG in Stuttgart since October 2018. He acquired a degree in Business Information Systems and before joining codecentric he has been in the IT business for more than four years. Thereof, he worked for two years as a Consultant and Specialist for Content Management Systems. Furthermore, he gained two years of experience as a System Integration Engineer in the mobility sector.

He is focusing on Infrastructure as Code and running applications in the Cloud. Thereby, he appreciates the simplicity of software products and likes to put different components together, so that new solutions with high value for his customers are created. One of his favourite occupation is the deletion of unused code.

Avatar

Adrian has joined codecentric Stuttgart in Februar 2019 as an IT Consultant. Working to connect cars to the cloud, he developed a passion for small electrical devices aka IoT. He loves to code in Kotlin, keeping an eye to always apply Clean Code principles. Additionally, he has a deep interest in non-blocking and distributed systems based solutions.

Comment

Your email address will not be published. Required fields are marked *