Serverless is a model in which cloud providers are solely responsible for operating the infrastructure. Compute resources are structured into functions with the Serverless approach. Therefore, this is called Functions as a Service (FaaS). The costs for functions are calculated based on the execution time and dimensioning (memory and CPU).
A function always executes by reacting on an event. Events can be created by several sources: HTTP requests, messages from a pub-sub-services, uploads, cronjobs and events from other cloud services (e.g. user registration). Functions can also be orchestrated into state machines (e.g. AWS Step Functions).
Functions are not the only resources used in Serverless systems. In combination with managed services no wish remains open in the cloud universe for developing and operating software. These services provide database, messaging, e-mail and SMS delivery, REST API, Content Delivery Network, queuing as well as secrets management.
Serverless allows us to focus on solving business problems with minimal distraction. If we no longer have to worry about running our own platforms, mail servers, databases and runtimes, this frees us from complex work and highly reduces labour costs. However, there is the challenge of arranging these cloud resources in durable, robust and secure systems.
To address these challenges and get the most out of Serverless applications and managed services, this article will give you some advice for your next Serverless (or even normal public cloud) project.
Before we start, this article does not refer to the Serverless Framework (one of many ways to implement Serverless apps) but to the execution model and how to harness it properly. Examples use Serverless services from Amazon Web Services (AWS), currently the most mature cloud provider for this purpose.
Staging concept and multi-account strategy
As staging is essential to most software systems, this also counts for cloud and Serverless applications. A common approach is to have three environments for different purposes: development, test and production. Development is a place where features are developed and experimented with. On test (maybe you prefer pre-production or staging), changes are tested in a production-like environment. If everything is alright they are delivered to production.
External systems with which our system communicate should ideally also practise staging. This allows integrating each system’s environments against its counterparts. Just then it possible to use environments for their exact purpose.
What is different about staging in the cloud? We do not roll out to different servers, clusters or orchestrator workspaces but to different cloud accounts. Each environment should live in its own account to be isolated for reasons of security, auditing capabilities, cost transparency, reproducibility and flexibility (creation and deletion).
Multiple cloud accounts per project are common practice. Cloud providers also see it this way and offer solutions for implementing a multi-account strategy. AWS Organisations lets us create an organisational structure underneath the root account. This usually consists of subject-related or aspect-related organisational units (OUs) into which teams arrange their accounts. In order to secure these structures, make them auditable and forensically investigable in case of emergency, services such as AWS Control Tower, CloudTrail and SSO complete the picture.
With staging and a multi-account strategy, a solid start has been made, also for non-Serverless systems in the cloud. Now the most important structures are available to our project for further shaping. It is time to solve business problems with them.
Business related deployment units
A deployment unit is a group of source code that belongs together in a functional or technical way. It’s rolled out together, too. An example domain “food” contains the definition and implementation of food related REST resources. A deployment unit also brings its own infrastructure as code, for example to define database tables and S3 buckets for storing food and images. If other deployment units need to know about newly added food items, we can publish event on a messaging topic (e.g. with AWS SNS) that consumers can subscribe to. Of course, in order to reproducibly roll out the resources to cloud accounts with our code, we also need a pipeline that tests, builds and roll out.
What is the actual size of a “deployment unit”? A unit can be freely defined in terms of its size. From a monolithic unit that rolls out as one, to a microservice cut, to individual functions per unit, everything is possible. In this regard, “Best practices for organizing larger Serverless applications” from AWS is worth reading.
Event-driven communication everywhere
The most elegant interfaces in our Serverless system are those where we can rely on event-driven (asynchronous) communication. Events via messaging (AWS SNS) are the preferred choice if our domains are to communicate with each other. Even when communicating within a domain, decoupling through messaging is almost always superior (compared to direct, synchronous communication between functions).
If domains communicate with each other via events, it looks like this: a function sends a message on a topic via a pub-sub-service such as AWS SNS. The consuming function of the other domain can subscribe to the topic, receive and process its messages. There are usually good reasons for putting a queue in between SNS topic subscriptions and the consuming function. The queue then receives SNS messages and keep them until the consumer can process them successfully or the message is to expire. If processing fails (for example, because a required third-party system is offline or overloaded), the message is placed back in the queue to be re-tried later.
Within a domain, the transmission of messages via SNS can be omitted. In this case a queue is sufficient to decouple the consumer from the producer.
Trade-offs between business and technical dependencies
For the example given, “food” needs an SNS topic “food-events” to inform another deployment unit “product” that food has been added. We want to loosely (i.e. asynchronously) couple producer and consumer via messaging. The two functional deployment units should act fault-tolerant and robustly among each other, which also means that we can roll them out independently without a fixed sequence. However, if the topic “food-events” were created as part of the rollout of “food”, although “product” is also dependent on the existence of the topic (which is a defined cloud resource), this independence would not be given.
Often there is at least one deployment unit that creates common technical foundations for the business related units. We can host shared resources such as messaging topics, DNS records and email configurations in a technical deployment unit called “infrastructure”. This unit always rolls out before the business related units do to ensure common preconditions on the target environment. Technically, for example, in AWS the precondition for using SNS is the messaging topic and thus its resource number (ARN). We get that ARN after creating the topic and afterwards need it for producer and consumer to publish and subscribe to events.
Using Terraform this is done using an output. The ARN is persisted by the remote state of “infrastructure”. The corresponding IAM policies and subscriptions are created based on that output to the topic of “product” and “food”.
In addition to resources that are used by each stage, there are “shared infrastructure” resources used centrally by all environments (e.g. main web domain, cross-account IAM structures). Shared infrastructure should be rolled out before per-stage infrastructure as it may contain foundations.
Rolling out REST APIs and User Interfaces
The sample application now consists of the business units “food” and “product” as well as the technical units “infrastructure” and “shared infrastructure”. To make our functionality usable from the outside via interfaces and surfaces, the API Gateway and CloudFront services are available in AWS.
An API Gateway can be configured with REST resources (also via infrastructure as code). Lambda functions implement each resource (or their individual HTTP methods). CloudFront is great for globally hosting web applications out of an S3 bucket.
A common challenge in software product development is whether to roll out APIs and user interfaces per business domain or centrally for the entire system. Business related modularisation aims to roll them out together with the rest of the domain’s business code. But this is a sweeping statement. A case decision should be made instead. The following criteria can play a role in that decision:
- Technical limitations: an API gateway “occupies” one subdomain (e.g. food.mycorp.com), the definition of the path is solely up to the REST resources. Is it OK for my case to separate the whole REST API via subdomains or is a common (sub) domain better?
- Frontend architecture: Individual frontends that integrate per URL? Individual pages that are integrated on the server or client side by a central UI? A central single-page application that integrates all domains via HTTP?
- Costs: a central API Gateway, CloudFront & S3 alone are not expensive. However, if one wants a REST API and/or UI per domain, this will sooner or later become visible on the invoice as the number of business domains increases.
- Developer Experience: should the REST API be accessed by external parties? Is a REST API separated by subdomains understood and accepted well enough or does it have to live on one exact domain?
Since the central user interface (if there is one) depends on the REST API of the business related deployment units, it forms its own deployment unit. If you also decide to use a central REST API, this also forms its own deployment unit, which should be rolled out before the user interface because of their dependency.
Now that REST API(s) and user interface(s) have also been rolled out (initially or in the future for changes/releases), it’s time to run our end-to-end testing against REST API(s) and user interface(s).
Synchronous communication is an exception
Sometimes a function (or another workload) also requires data synchronously from another domain. Instead of processing only events or performing replication of data between domains, a direct function call also works. An alternative is the collection of such internal synchronous functions via a restricted API Gateway. The gateway and its consumers could be located in an Amazon VPC. An obvious disadvantage: during development one has a hard time accessing resource that are located in VPC.
All these types of interfaces in and around a Serverless system have something in common: After roll out, producers and consumers sign a contract. The producer should keep changes to its events or REST interfaces meticulously backward compatible or use API versioning. To document and ensure these contracts, it is advisable to write end-to-end tests against the APIs. Scripts with HTTP client and AWS SDK or tools like Pact (Contract Testing) are good for this.
Like other IT systems, Serverless applications rely on a conscious and clean structure so that they can be developed further in a secure, maintainable and sustainable way.
The structuring process of structuring starts by defining a proper multi-account strategy as well as a staging concept. Hereafter one should focus on the definition of business related as well as technical deployment units (even in monolithic approaches). Definition of external (HTTP) and internal APIs (messaging, events) as well as one or multiple user interfaces and where they’re properly placed within the picture concludes the draft for a freshly baked Serverless system.
Once a team and organisation have understood these basics and learned how to use the cloud services they need, implementation of Serverless systems seems like child’s play.
Misuse of cloud services, bad architecture and incorrectly applied programming will most likely have a negative impact on cloud costs. If costs for managed services suddenly rise sharply, the team should properly review their last design decisions.