Use Serverless AWS step functions to reduce VPC costs

No Comments

Recently I found myself in a situation where a customer (big in the music festival business) requested a cloud solution supporting the continuous reporting of administrative business workflows. They required an architecture which demands high availability and up and down scalability on a daily basis with factors of 500% due to the seasonal nature of the customer’s business. Serverless Lambdas initially proved to be a suitable technique for the requirements at hand. We used AWS step functions to lower VPC costs.

In this article I will take you on a tour from our initial naive Lambda setup to an improvement and explain what went wrong. Ultimately we decided on using AWS step functions to perform the coordination of the Lambda workflow, which added a benefit of being able to cross VPC boundaries. This enabled optimisation of our VPC component usage. Read about our journey below!

Initial Serverless VPC architecture

Our solution processes files being put on S3 by an external system. A Lambda function listens to S3 events, which takes a file, transforms the content, and loads it into an RDS database, so Quicksight can report on it. The AWS architecture is designed as follows.

AWS serverless cloud infrastructure architecture

After a few weeks, our solution fell short. Time-outs caused S3 reads to be unsuccessful and thus hanging in an infinite loop. Also for our happy flow, the entire read, transform, and load flow was unacceptably slow.

Improved Lambda data processing architecture

To speed up the processing, we knew we had to cut down the steps into parallel executions. RDS was under little stress, so there was room to shorten ingestion time. So after breaking the execution up it would look conceptually like the following. Where A, B, and C are different operations which each can be further parallelised as well.

Lambda execution

One of the biggest challenges in splitting into parallel execution is that it requires a lot more orchestration/coordination effort. At first we thought about using SQS to orchestrate this work, but this requires significant rework in keeping some state (like in DynamoDB) to implement some state/progress awareness. Furthermore the added boilerplate was not appealing. AWS step functions was the second to consider which supports exactly this. The more complicated set-up introduced an additional requirement of putting a status notifications back to an origin service to notify about a successful data transformation. However, because the complete set-up was running inside a VPC, this would mean adding an egress-only internet gateway. This leads to pay-by-the-hour expenses, which in my view mismatch with the Serverless pricing philosophy. Having RDS is a necessity for our Quicksight reporting, but with our seasonal load, we stick to a pay-for-use model as much as possible.

Luckily, step-functions have no boundaries as to where Lambdas are located. So you are able to place Lambdas in VPC (even private subnets with zero internet access) close to secure resources, and invoke non-VPC Lambda functions with zero fixed cost internet usage at other steps!!!

Our new setup looks like the following architecture.
a
AWS renewed architecture AWS step functions

Reducing VPC costs with AWS step functions

We separated one huge Lambda invocation which operated at the maximum call length (15 mins) into parallel processing. Lambas are paid by the 100 ms so naturally a split of one to five separate Lambdas costs potentially almost an additional 400 ms per invocation. However, each workload can now be downsized to exactly the right resource utilisation in terms of memory and time. Every smaller run is also a tad more reliable in terms of duration (smaller variation) and your memory is quite consistent between runs, which makes for easier tuning. Our biggest payoff was that we could lose the NAT gateway. Which alone accommodates for 500 million Lambda requests of processing (100 ms, 512 mb).

In practice, some resources put restraints on the location of Lambdas, forcing them to run in private subnets. Security-wise I am happy that this is possible, but cost-wise it comes with a lot of added expenses. As I have shown, you can avoid VPC costs by using step functions. Other event chaining sources like S3, SQS, Kinesis share this quality. However, AWS step functions are the only thing that actually helps you with orchestration, making this a tool of my choice.

References

AWS step functions: https://aws.amazon.com/step-functions/

Kevin van Ingen

Kevin has a background in software engineering, economics and information science. After working as a development freelancer and teacher he returned to apply a broad multidisciplinary perspective to development projects.

More content about Architecture

Comment

Your email address will not be published. Required fields are marked *