Cost-effective batch jobs on AWS’ serverless infrastructure

3.6.2020 | 7 minutes of reading time

There are batch jobs that require much engineering and fine-tuning on serious hardware to make them feasible. However, many batch jobs run on oversized infrastructure and accumulate much more costs than necessary. Migrating these jobs to a serverless approach generates two advantages: a simplified workflow and massive savings in the long run. In this blog post, I will walk you through one exemplary architecture on AWS. It will showcase how batch jobs can run both reliably and cost-effectively in a serverless environment.

Note: this post provides an overview of the architecture and the basic principles. If you want to reproduce the architecture, take a look at the repository here which includes all the necessary code.

What are the requirements?

For the sake of our example, let’s consider three requirements:

The interface for data in- and output of the service should be S3 buckets, i.e., general cloud storage.
The service needs to run daily and should be as cheap as possible.
The client’s developers want to focus on the processing logic, not the cloud infrastructure.

I will show you how these requirements translate to a serverless cloud architecture in a minute. But first, let’s talk about why serverless lends itself as a design approach.

Why choose a serverless approach?

There is a myriad of content about what serverless can and cannot do. I will, therefore, not spend time with generalized ruminations. Instead, let me justify serverless along the requirements defined above. One of the main reasons links to the development workflow; the other one is aboutcosts.

From a workflow perspective, a serverless approach provides a level of abstraction that developers highly appreciate. It frees them from managing and maintaining infrastructure. Since the cloud providers shoulder most of the usual responsibilities, developers can focus on implementing new features and adding business value. Such focus accelerates deployment and builds strong ownership.

When it comes to costs, serverless is – as a rule of thumb – cheaper than self-managed alternatives. Even when self-managed infrastructure seems to be more affordable, the wages for specialized teams typically outweigh the savings. Given the current state of the industry, it is tough to be more effective than the workforce behind the cloud services of AWS, Azure, or Google.

How to build the architecture for a serverless batch job?

Let me walk you through the proposed architecture in three steps that cover the organizational context, the overall architecture, and the specific building blocks and workflows.

Who will work with the implemented architecture?

Frequently, people develop cloud architectures without thinking about their users. In our case, we have to account for two roles to make it work. First, developers implement the processing logic. They should not be bothered with infrastructure details. Second, cloud engineers keep the system running. They should not be bothered with the details of the implemented logic.

In other words, we want to build a system with distinct realms of responsibilities and unambiguous touchpoints between them. To be clear, I do not advocate an organizational or cultural border between two camps. Instead, such a design aims at streamlining communication to decrease frustration and delays.

How is the overall architecture structured?

The big picture of this architecture contains three parts: networking, IAM roles and policies, and cloud services. The networking layer enables the services to communicate and protects against unauthorized access. The IAM roles and policies configure which actions services are allowed to take. The cloud services provide the infrastructure to integrate the business logic and execute the batch job. Due to the complexity in each part, this blog post focuses on the general workflow to keep things straightforward. As mentioned above, details will be available in related blog posts. Here is a graphical overview of the proposed architecture:

Blue indicates parts of the architecture that belong to development. In contrast, the orange box shows which building blocks run the batch job during operations. As you can see, both workflows share only one building block.

By conceptualizing the service in this way, developers and cloud engineers are largely decoupled and can focus on their respective responsibilities. Whenever developers push code to the master branch, the system builds a new Docker image. The next time the batch job runs, it will use the updated container image, i.e., run the updated business logic.

Developers need cloud engineers’ direct support in just two scenarios:

An update to the business logic requires access to additional cloud resources. In this case, a cloud engineer adjusts the IAM roles and policies.
An update necessitates different computational resources to run than the previous version. In this case, a cloud engineer needs to change the configuration of the hardware assigned to the batch job.

Both of these scenarios are easy to communicate and to account for by developers and cloud engineers.

Which role do the individual building blocks play?

The four development components

CodeCommit repositories are git repositories hosted on AWS with limited functionalities. Yet, they are a simple solution for keeping all the code within the AWS ecosystem. It is also possible to connect other versioning services, such as GitHub or Bitbucket, into the service via webhooks.

CodeBuild is the AWS flavor of a CI pipeline’s building part. It reads a specification file directly from the connected repository. CodeBuild then executes the instructions. Here, the CodeBuild project uses Docker to build a container image and pushes it to the registry for later use.

CodePipeline takes care of the orchestration and artifacts of CodeCommit and CodeBuild. In this case, it reacts to updates in the master branch of the code repository and triggers a new build.

The Elastics Container Registry (ECR) is the final building block of the developer workflow. The ECR is where CodeBuild pushes the new container images. The batch job collects it later on from here. The ECR is also where the realms of developers and engineers overlap.

The two batch job components

The computational heart of the service is the Elastic Container Service (ECS) in its Fargateflavor. Fargate is the serverless capacity provider of AWS. The user does not have to allocate VMs or other resources. Instead, a task definition includes specifications on how much computational power and memory are needed.

The second building block is a CloudWatch Trigger. Triggers can fire based on cron expressions or in intervals. Depending on whether the timing is essential, both can be viable options. When the trigger fires, a new instance of the task is sent to Fargate and executed.

These are already the main building blocks of this architecture. In production, some more services are necessary. For instance, there are S3 buckets for data storage, connections to CloudWatch logs for monitoring and debugging, and network configurations, such as VPC, subnets, and routing tables.

What are the alternatives?

Serverless is not a panacea; neither is this architecture. To leave you with an idea of when it is promising to follow this path, I want to sketch two alternative approaches.

Virtual Machines

The first alternative is to execute the workload on a dedicated virtual machine, i.e., an EC2 instance on AWS. There are two main ways to do this for the scenario described above.

First, one can keep an instance running and implement a Cron job (or something comparable) on it. I saw this pattern in a previous project, but it is a terrible choice for (at least) two reasons. The costs are way higher than the serverless approach because the VM costs money no matter whether its load is high or low. From the point of flexibility, changing the logic only works by terminating the process, updating the code, and restarting again.

Second, one could boot an instance for the batch job and terminate it afterward. That is similar to the approach above, but with one significant caveat: there are way more moving parts that can break during the process.

Serverless Functions

On the other side of the spectrum are serverless functions or Lambda functions in the AWS ecosystem. They are lean and very cheap. However, you need to consider two things before using them:

Compared to container images, Lambda functions are less flexible. For instance, using third-party Python libraries can be painful since you have to provide a .zip file with the dependencies. Compare this hustle with a pip command as part of a Dockerfile.
Lambda functions are limited in computational power and runtime duration. These limitations can become problematic when the amount of data or the complexity of the processing increases.

Thanks for reading!

If you want to know more about how to save cloud costs in general, have a look at our Cloud Cost Cleanup offer ! If you need other help or wish to have more extensive discussions on the topics, my colleagues and I are happy to be there for you!

Was this post helpful?

Likes

Blog author

Timo Böhm

Do you still have questions? Just send me a message.

fromTimo Böhm

Höhere Business Agility durch den aktiven Umgang mit Push- und Pull-Systemen...

Im Rahmen agiler Transformationen entstehen häufig starke Reibungsflächen in der Ablauforganisation. Ein verbreitetes Beispiel ist das Aufeinandertreffen klassischer Projekt- bzw. Budgetplanungen auf der einen und einer agilen Arbeitsweise in den operativen...

Agilität
Agile
Agile Transformation
Change Management
Process Management

9.12.2022 | 10 Minuten Lesezeit

Timo Böhm

Schnelles Training eines Recommendation-Modells durch BigQuery ML

Machine Learning (ML) kann nur durch Modelle in der Produktion Business Value erzeugen. Allerdings kann die Zeitspanne zwischen der Entwicklung der nächsten Iteration eines Modells und dessen Einsatz in einer Produktionsumgebung massiv sein. Dies gilt...

Accelerate
Cloud
Data
Google Cloud
Machine Learning

26.7.2021 | 11 Minuten Lesezeit

Niklas Haas

Timo Böhm

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Machine Learning (ML) erzeugt erst dann realen Mehrwert, wenn es in Produktion benutzt wird. Allerdings kann die Zeitspanne zwischen der Entwicklung eines belastbaren Modells und dessen Einsatz frustrierend lange sein. Insbesondere in schnelllebigen ...

Agile Methoden
Cloud
Machine Learning

26.7.2021 | 5 Minuten Lesezeit

Timo Böhm

Niklas Haas

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Public Cloud im regulierten Sektor: Das ist zu beachten

Es war längere Zeit ein weit verbreitetes und in strategischen Debatten häufig zitiertes Missverständnis, dass die Bundesanstalt für Finanzdienstleistungsaufsicht (BaFin) dem Einsatz von Public-Cloud-Anbietern wie AWS, Azure und Co. einen Riegel vorschiebt...

Cloud
Compliance

10.4.2024 | 6 Minuten Lesezeit

Marc Bialowons

Björn Bohn

Green Cloud: Daten und Emissionen sparen

Das Internet produziert jährlich 900 Millionen Tonnen CO₂ – das ist deutlich mehr als Deutschland insgesamt emittiert. Hauptverantwortlich ist der immer weiter steigende Stromverbrauch beim Transport und der Speicherung von Daten. Wenn ihr kurz darüber...

Cloud
Green IT
Softwarearchitektur
Data

11.3.2024 | 5 Minuten Lesezeit

Dennis

AZ-900-Zertifizierung: Mein How-to!

Was ist AZ-900? Azure bietet eine Reihe verschiedener Zertifizierungen an. Zu finden sind sie hier. Darunter befindet sich auch die Zertifizierung AZ-900. Bei diesem Zertifikat handelt es sich um Microsoft Certified: Azure Fundamentals. Diese prüft unter...

Azure
Cloud

2.1.2024 | 5 Minuten Lesezeit

Ege Inanc

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

In der Welt der Cloud-Technologie und insbesondere bei AWS (Amazon Web Services) ist die effiziente Verwaltung von Ressourcen von entscheidender Bedeutung, um unnötige Kosten zu vermeiden. Dieser Blogbeitrag konzentriert sich auf AWS S3 und die teuren...

AWS
Cloud

27.11.2023 | 4 Minuten Lesezeit

Lukas Miliunas

Maximilian Mayer

Cloud FinOps

Cloud FinOps bietet einen etablierten Prozess, um Kosten für den Cloudbetrieb zu reduzieren (s. auch diesen Artikel). Zu diesem Zweck bietet es ein etabliertes Cloud-unabhängiges Vorgehen, das eine Organisation schrittweise aufgreifen kann. Das Tooling...

Cloud
Cloud Native
Green IT

26.10.2023 | 5 Minuten Lesezeit

Lukas Miliunas

Marco Paga

Mehr Struktur in der Cloud mit Azure Landing Zones

Die Migration in die Cloud bringt einige Herausforderungen mit sich. Viele Unternehmen stehen vor der Frage, wie ein effizienter und sicherer Aufbau einer skalierbaren Cloud-Infrastruktur umzusetzen ist. Die Antwort auf diese Herausforderung liegt in...

Cloud
Azure
IT-Governance

4.8.2023 | 4 Minuten Lesezeit

Florian Moll

Nils Bauroth

CI/CD-Pipelines mit AWS CDK CodePipeline

Das Aufsetzen der CI/CD-Pipeline ist ein typischer Task in der Anfangszeit eines Projekts. Ist die Pipeline dann aufgesetzt, sind Änderungen nur noch selten notwendig. Dementsprechend wenig Routine entwickeln Programmierende im Umgang mit der Konfiguration...

Cloud
CI/CD
AWS

17.7.2023 | 4 Minuten Lesezeit

Dennis

Green Cloud: Nachhaltig skalieren

Wenn Softwareprojekte in die Cloud gebracht werden, versprechen wir uns davon hohe Verfügbarkeit, planbare Kosten und eine immer dem Bedarf entsprechende Skalierung. Aufgrund der grenzenlosen Angebote ist es aber auch leicht, die Komponenten eines Systems...

Cloud
Softwarearchitektur
Green IT

12.6.2023 | 5 Minuten Lesezeit

Dennis

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Crossplane ist ein plattformübergreifendes Kontrollsystem (Control-Plane), das das Management von Cloud-Ressourcen vereinfachen und automatisieren soll. Das Tool ermöglicht es, verschiedene Cloud-Provider und lokale Ressourcen, z. B. Kubernetes-Cluster...

Cloud
Cloud Native

12.5.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Green Cloud: Ideen für eine nachhaltigere Architektur

Die ökologische Nachhaltigkeit eines Systems ist aktuell häufig noch kein Thema. Nachhaltigkeit bedeutet für mich in diesem Kontext die Reduktion der verursachten Emissionen durch gesenkten Ressourcenverbrauch – egal ob die Emissionen beim Cloudprovider...

Cloud
Softwarearchitektur
Green IT

5.5.2023 | 5 Minuten Lesezeit

Dennis

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

Ist die Cloud der große Umweltsünder?

Rechenleistung und Speicher kosten nicht nur Geld. Sie verbrauchen auch Mengen – potenziell klimaschädlicher – Energie. Das überrascht die Wenigsten, im kollektiven Bewusstsein ist es aber bislang kaum angekommen. Sehr wohl bewusst ist es natürlich den...

Cloud

18.1.2023 | 2 Minuten Lesezeit

Matthias Niehoff

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Infrastructure as Code (IaC) ist inzwischen ein alter Hut. Frameworks wie Terraform, Ansible und andere haben Standards geschaffen. Kaum jemand provisioniert produktive Systeme heute ohne IaC – sei es in der Cloud oder auf der eigenen Infrastruktur.Und...

Infrastructure as Code
AWS
Cloud

21.12.2022 | 3 Minuten Lesezeit

Matthias Niehoff

Infrastructure as Code in AWS: Keine Silver Bullet

TL;DR Es gibt keine Universalmethode. Infrastructure as Code ist ein vergleichsweise neuer Ansatz. Einige Lösungen rund um Infrastructure as Code befinden sich noch in der Entwicklung. Es gibt keinen klaren Favoriten. Die Wahl des passenden Tools hängt...

Cloud
AWS
Infrastructure as Code

13.12.2022 | 27 Minuten Lesezeit

Florian Wiech

Sören

AWS CloudFront Functions testen

Mit den CloudFront Functions bietet AWS die Möglichkeit, den Funktionsumfang von CloudFront um kleine JavaScript-Funktionen zu erweitern. AWS führt diese Funktionen direkt an den Edge-Locations aus und ermöglicht es dadurch, alle ankommenden Requests...

Cloud
AWS
Testing
Softwareentwicklung

4.10.2022 | 3 Minuten Lesezeit

Dennis

Die Zukunft der IDEs – aus Sicht eines „Java-EE-Entwicklers“

Bei unseren Kunden und auch bei codecentric dreht sich alles um den besten und schnellsten Weg, die richtige Software zu entwickeln – und das natürlich in hoher Qualität. Von daher bin ich auch ein fleißiger Leser des „State of DevOps“-Report (hier zum...

Cloud
Java
Remote Work

16.5.2022 | 11 Minuten Lesezeit

Rainer Vehns

Green Cloud: Emissionen unserer Cloud-Architektur messen

Überall wird von der Cloud geschwärmt: Grenzenlose Skalierung und unzählige Features sind bereits „out of the box“ verfügbar. Das alles gibt es zu unschlagbar günstigen Preisen. Das Thema Nachhaltigkeit kommt dabei selten zur Sprache. Rechenzentren verbrauchen...

AWS
Azure
Cloud
Google Cloud
Green IT

24.4.2022 | 6 Minuten Lesezeit

Dennis

Terraform Remote State richtig nutzen

Was ist Terraform und was ist State?Terraform ist ein Tool für die Verwaltung von Infrastruktur in Form von Code, gehört also in den sogenannten Infrastructure-as-Code-Bereich (IaC). Eine kurze Einführung und ein Vergleich zu anderen Tools findet sich...

Infrastructure
Softwarearchitektur
Cloud
DevOps

21.4.2022 | 7 Minuten Lesezeit

Alexander Kasper

Stream Processing mit Kafka Streams und Spring Boot

Kontinuierliche Datenströme in verteilten Systemen ohne Zeitverzögerung zu verarbeiten, birgt einige Herausforderungen. Wir zeigen euch, wie Stream Processing mit Kafka Streams und Spring Boot gelingen kann. Alles im Fluss: Betrachtet man Daten als fortlaufenden...

Softwarearchitektur
Cloud
IoT
Messaging
Kotlin
Spring

20.12.2021 | 20 Minuten Lesezeit

Maik Fleuter

Lukas Maier

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Agile Methoden
Cloud
Machine Learning

26.7.2021 | 5 Minuten Lesezeit

Timo Böhm

Niklas Haas

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Cost-effective batch jobs on AWS’ serverless infrastructure

What are the requirements?

Why choose a serverless approach?

How to build the architecture for a serverless batch job?

Who will work with the implemented architecture?

How is the overall architecture structured?

Which role do the individual building blocks play?

The four development components

The two batch job components

What are the alternatives?

Virtual Machines

Serverless Functions

Thanks for reading!

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Höhere Business Agility durch den aktiven Umgang mit Push- und Pull-Systemen...

Schnelles Training eines Recommendation-Modells durch BigQuery ML

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Public Cloud im regulierten Sektor: Das ist zu beachten

Green Cloud: Daten und Emissionen sparen

AZ-900-Zertifizierung: Mein How-to!

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

Cloud FinOps

Mehr Struktur in der Cloud mit Azure Landing Zones

CI/CD-Pipelines mit AWS CDK CodePipeline

Green Cloud: Nachhaltig skalieren

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Green Cloud: Ideen für eine nachhaltigere Architektur

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Ist die Cloud der große Umweltsünder?

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Infrastructure as Code in AWS: Keine Silver Bullet

AWS CloudFront Functions testen

Die Zukunft der IDEs – aus Sicht eines „Java-EE-Entwicklers“

Green Cloud: Emissionen unserer Cloud-Architektur messen

Terraform Remote State richtig nutzen

Stream Processing mit Kafka Streams und Spring Boot

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten