Going serverless: How to move files from on-prem SFTP to AWS S3

26.2.2019 | 7 minutes of reading time

Motivation

It is not so rare that we as developers land in a project where the customer uses SFTP (SSH File Transfer Protocol) for exchanging data with their partners. Actually, I can hardly remember a project where SFTP wasn’t in the picture. In my last project, for example, the customer was using field data loggers that were logging and transferring measured values to an on-premises SFTP server. But what was different that time was that we were building a serverless solution in AWS. As you probably know, when an application operates in the AWS serverless world, it is absolutely essential to have your data in S3, so it can easily be used with other AWS services for all kinds of purposes: processing, archiving, analytics, and so on.

What we needed was a mechanism to poll the SFTP server for new files and move them into the S3 bucket. As a result, we built a custom serverless solution with combination of AWS managed services. It is reasonable to ask why we didn’t use AWS Transfer for SFTP . While the answer is simple (it didn’t exist at that time), I think a custom solution still maintains its value for small businesses, where traffic is not heavy and the SFTP server is already part of the existing platform. If this sounds interesting, keep on reading to find out more.

From SFTP to AWS S3: What you will read about in this post

Custom solution for moving files from SFTP to S3
In-depth description of the architecture
Solution constraints and limitations
Full source code
Infrastructure as Code
Detailed guide on how to run it in AWS
Video instructions

The architecture

Let’s briefly start by explaining what our solution will do. It will scan an SFTP folder and it will move (meaning both copy & delete) all files from it into an S3 bucket. Actually, it doesn’t have to be only one folder/bucket pair, you can configure as many source and destination pairs as you want. Another important thing to ask is: when does it get executed? It does so based on a schedule. You will use a Cron expression to schedule the execution, so it is pretty flexible there.

The following is a list of AWS services and tech stacks in use:

How it works

CloudWatch Event is scheduled to trigger Lambda, and Lambda is responsible for connecting to SFTP and moving files to their S3 destination.
This approach requires only one Lambda to be deployed, because it is source- (SFTP folder) and destination- (S3 bucket) agnostic. When CloudWatch Event triggers Lambda, it passes the source and destination as parameters. You can deploy a single Lambda, and many CloudWatch Events that will all trigger the same Lambda, but with different source/destination parameters.

Node.js and Lambda: Connect to FTP and download files to AWS S3

The centerpiece is a Node.js Lambda function. It uses the ftp client module for communicating with FTP server. Every time CloudWatch Event triggers Lambda, it will execute this method:

1async execute(event: ImportFilesEvent): Promise<void> {
2    const ftpConfig = await this.readFtpConfiguration();
3    this.ftp.configure(ftpConfig);
4    await this.ftp.connect();
5    const files = await this.ftp.list(event.ftp_path);
6    for (const ftpFile of files) {
7        const fileStream = await this.ftp.get(`${event.ftp_path}/${ftpFile.name}`);
8        await this.s3.put(fileStream, ftpFile.name, event.s3_bucket);
9        await this.ftp.delete(`${event.ftp_path}/${ftpFile.name}`);
10    }
11    this.ftp.disconnect();
12}

It iterates through the content of the given folder and moves each file to the S3 bucket. As soon as the file is successfully moved, it removes the file from its original location.
Notice event.ftp_path and event.s3_bucket in the code above. They are coming from the CloudWatch Event Rule definition, which will be described in a following section.

CloudWatch Event Rule

CloudWatch Event is scheduled to trigger Lambda by creating CloudWatch Event Rules. Every Rule consists of Cron expression and Input Constant. Input Constant is exactly the mechanism we can use to pass source and destination.

Now, when you take a look at the signature of the handler, you’ll see an ImportFilesEvent:

1const handler: Handler<ImportFilesEvent, void> = async (event: ImportFilesEvent) => {
2    console.log(`start execution for event ${JSON.stringify(event)}`);
3    ...
4};

This is exactly the value of the Input Constant and it is shown in the logged output as:

2019-01-16T07:30:11.430Z ... start execution for event
{
    "ftp_path": "source-one",
    "s3_bucket": "destination-one"
}

FTP connection parameters

FTP connection parameters are stored in another AWS Service called Parameter Store . Parameter Store is a nice way for storing configuration and secret data. Value is stored as JSON:


{
  "host": "18x.xxx.xxx.xx",
  "port": 21,
  "user": "*********",
  "password": "********"
}

When Lambda executes readFtpConfiguration(), it reads the FTP Configuration from Parameter Store.

Limitations and constraints

Be aware that this is not a solution to synchronization of SFTP and S3, neither is it in real time. Don’t expect that as soon as file is uploaded to SFTP, it will appear on S3. It will execute on schedule.
Another thing is how much data it can handle. AWS Lambda has its limitations . Since this solution is built to scan entire folder and transfer all files from it, if there are too many files, or files are very large, it can happen that Lambda hits one of its limits. It works well when there were dozen of files and each file was never larger than a few KBs.

If there are network issues during transfer, Lambda will break, but since Amazon CloudWatch Events invokes Lambda functions asynchronously, it will retry execution. But I encourage you to explore its limits on your own, and let me know in the comments section if you see how to build more resilience for failures.

Tests are missing. Testing Lambda is another big topic and I wanted to focus on the architecture instead. However, you can refer to another blogpost to find out more about this topic.

Run the code with Terraform

To use Lambda and other AWS services, you need an AWS account. If you don’t have an account, see Create and Activate an AWS Account .
Another thing you’ll need to install is Terraform , as well as Node.js .

When everything is set up, run git clone to get a copy of the repository , where the full source code is shared.

1$ git clone git@gitlab.codecentric.de:milica.zivkov/ftp-to-s3-transfer.git

You will run this code in a second. But before that, you’ll need to make two changes. First, go to the provision/credentials/ftp-configuration.json and put real SFTP connection parameters. Yes, this means you will need an SFTP server, too. This code will try to download folders named source-one and source-two, so make sure you have them created.
Second, go to the provision/variables.tf and change the value of default attribute. AWS has that rule for naming S3 buckets – names should be globally unique. You will use this parameter to achieve this uniqueness.

Next, build the Node.js Lambda package that will produce Lambda-Deployment.zip required by terraform.

1$ cd move-ftp-files-to-s3
2$ npm run build:for:deployment
3$ cd dist
4$ zip -r Lambda-Deployment.zip . ../node_modules/

When Lambda-Deployment.zip is ready, start creating the infrastructure.

1$ cd ../../provision
2$ terraform init
3$ terraform apply

If you prefer video instructions, have a look here:

Now, you should see a success message Apply complete! Resources: 17 added, 0 changed, 0 destroyed.. At this point all AWS Resources should be created and you can check them out by logging in to AWS Console. Navigate to the CloudWatch Event Rule section and see the Scheduler timetable, to find information when Lambda will be triggered. In the end, you should see files moved from

1. source-one FTP folder –> destination-one-id S3 bucket and
2. source-two FTP folder –> destination-two-id S3 bucket

Summary: Going serverless by moving files from SFTP to AWS S3

This was a presentation of a lightweight and simple solution for moving files from more traditional services to serverless world. It has its limitations for larger-scale data, but it proves stable for smaller-sized businesses. I hope it will help you or serve as an idea when you encounter a similar task. Thank you for reading.

Was this post helpful?

Likes

Blog author

Milica Živkov

Do you still have questions? Just send me a message.

fromMilica Živkov

Testing Spring Batch applications

It’s been a few years now since Spring introduced the Spring Batch framework, a powerful framework for developing batch processing applications. It eased up our everyday work when it comes to importing data provided by another system, digesting larger...

Testing

3.12.2015 | 7 Minuten Lesezeit

Milica Živkov

Tutorial: Move your application to CloudBees

Few days ago I started to play around with porting one simple web application to the Cloud, to see how quickly it can be done. Provider of choice was CloudBees. CloudBees is one among many platform-as-a-service products available on the market, best ...

CI/CD

3.6.2014 | 7 Minuten Lesezeit

Milica Živkov

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

In der Welt der Cloud-Technologie und insbesondere bei AWS (Amazon Web Services) ist die effiziente Verwaltung von Ressourcen von entscheidender Bedeutung, um unnötige Kosten zu vermeiden. Dieser Blogbeitrag konzentriert sich auf AWS S3 und die teuren...

AWS
Cloud

27.11.2023 | 4 Minuten Lesezeit

Lukas Miliunas

Maximilian Mayer

CI/CD-Pipelines mit AWS CDK CodePipeline

Das Aufsetzen der CI/CD-Pipeline ist ein typischer Task in der Anfangszeit eines Projekts. Ist die Pipeline dann aufgesetzt, sind Änderungen nur noch selten notwendig. Dementsprechend wenig Routine entwickeln Programmierende im Umgang mit der Konfiguration...

Cloud
CI/CD
AWS

17.7.2023 | 4 Minuten Lesezeit

Dennis

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Infrastructure as Code (IaC) ist inzwischen ein alter Hut. Frameworks wie Terraform, Ansible und andere haben Standards geschaffen. Kaum jemand provisioniert produktive Systeme heute ohne IaC – sei es in der Cloud oder auf der eigenen Infrastruktur.Und...

Infrastructure as Code
AWS
Cloud

21.12.2022 | 3 Minuten Lesezeit

Matthias Niehoff

Infrastructure as Code in AWS: Keine Silver Bullet

TL;DR Es gibt keine Universalmethode. Infrastructure as Code ist ein vergleichsweise neuer Ansatz. Einige Lösungen rund um Infrastructure as Code befinden sich noch in der Entwicklung. Es gibt keinen klaren Favoriten. Die Wahl des passenden Tools hängt...

Cloud
AWS
Infrastructure as Code

13.12.2022 | 27 Minuten Lesezeit

Florian Wiech

Sören

AWS CloudFront Functions testen

Mit den CloudFront Functions bietet AWS die Möglichkeit, den Funktionsumfang von CloudFront um kleine JavaScript-Funktionen zu erweitern. AWS führt diese Funktionen direkt an den Edge-Locations aus und ermöglicht es dadurch, alle ankommenden Requests...

Cloud
AWS
Testing
Softwareentwicklung

4.10.2022 | 3 Minuten Lesezeit

Dennis

Green Cloud: Emissionen unserer Cloud-Architektur messen

Überall wird von der Cloud geschwärmt: Grenzenlose Skalierung und unzählige Features sind bereits „out of the box“ verfügbar. Das alles gibt es zu unschlagbar günstigen Preisen. Das Thema Nachhaltigkeit kommt dabei selten zur Sprache. Rechenzentren verbrauchen...

AWS
Azure
Cloud
Google Cloud
Green IT

24.4.2022 | 6 Minuten Lesezeit

Dennis

Automatisch skaliertes Cloud Native Consent Management in der Google Cloud

Immer häufiger ersetzen unsere Kunden lokale Rechenzentren durch eine Cloud-Infrastruktur. Die Gründe sind Ausfallsicherheit, Wartbarkeit und vor allem Skalierbarkeit. Mit dem letzten dieser Aspekte befassen wir uns in diesem Blogartikel anhand eines...

APM
Python
Cloud
Google Cloud
Infrastructure
Softwarearchitektur
Serverless

28.6.2021 | 16 Minuten Lesezeit

Markus Lüger

Christopher

Strukturierung von Serverless-Anwendungen in der Cloud

Serverless ist ein Modell, bei dem Cloud-Anbieter allein verantwortlich für den Betrieb der Server-Infrastruktur sind. Compute-Ressourcen werden beim Serverless-Ansatz hauptsächlich in Functions strukturiert. Daher wird dieser Bestandteil „Functions ...

Softwarearchitektur
AWS
Cloud
Serverless

14.6.2021 | 10 Minuten Lesezeit

Jonas Verhoelen

Vom Plastik in die AWS IoT Cloud

Was haben wir vor und was ist die codecentric Lernfabrik eigentlich?Im Rahmen unserer „Qualitätsoffensive Cloud ” und der Intensivierung des Themas Industrie 4.0 haben wir bei codecentric uns die 24V Lernfabrik von Fischertechnik angeschafft. Mit dieser...

Cloud
IoT
AWS
IIoT

20.5.2021 | 7 Minuten Lesezeit

David Schwarzmann

Jens Deters

Serverless Java mit AWS – Zwei Jahre Cloud-Native

Vor zwei Jahren haben wir angefangen, ein Kundenprodukt Cloud-Native auf Basis von Serverless, Java und AWS Managed Services umzusetzen. Im Folgenden möchte ich beschreiben, was wir in dieser Zeit gemeinsam gelernt haben und was wir heute besser machen...

Softwarearchitektur
Cloud
Java
Microservices
Serverless
Softwareentwicklung

2.12.2020 | 9 Minuten Lesezeit

Felix Massem

Deno – Einführung & Entwicklung einer einfachen REST API

Was ist Deno? Deno (ein Anagramm von „Node“ 🤯), ist eine JavaScript und TypeScript Runtime, die seit Mai 2020 in der Version 1.0 verügbar ist. Deno wurde von Ryan Dahl, dem ursprünglichen Entwickler von Node.js, entwickelt und soll einige konzeptionelle...

Node.js
API
JavaScript

7.10.2020 | 3 Minuten Lesezeit

Felix Magnus

Fotoverwaltung und Galerien – Teil 2: Statische Fotogalerien

Herzlich Willkommen zum zweiten Teil der Blogserie “Fotoverwaltung und Galerien”! Nachdem ich im ersten Teil meinen Weg hin zur Verwaltung meiner Fotos in einer NextCloud geschildert habe, möchte ich in diesem Teil darüber schreiben, wieso ich mich ...

JavaScript
Node.js
Webdevelopment

1.6.2020 | 4 Minuten Lesezeit

Stephan Köninger

AWS IoT mit Cognito absichern – Ein Schritt-für-Schritt-Guide

Mit AWS IoT bekommt man sehr schnell und leicht seine Daten in die Cloud. Doch was macht man mit diesen dann? In diesem Artikel gehe ich darauf ein, wie man eine einfache JavaScript SPA mit AWS IoT verbinden kann, um die Daten in Web-Applikationen zu...

Cloud
JavaScript
AWS
IIoT
IoT
IT-Security

15.10.2019 | 14 Minuten Lesezeit

Holger Apfel

AWS lokal entwickeln mit Serverless Framework Offline Plugins

Wer mit dem Serverless Framework auf AWS entwickelt, kann dies mit ein wenig Aufwand auch lokal auf dem eigenen Rechner tun. Hierfür gibt es eine Vielzahl verfügbarer Plugins, um Services wie z. B. Lambdas, DynamoDB oder S3 lokal zu nutzen. Wie ihr...

Cloud
Microservices
AWS
Node.js
Serverless

4.8.2019 | 5 Minuten Lesezeit

Tobias Schaber

Solution Factory – In 9 Wochen von der Idee zum Produkt

Digitalisierung revolutioniert jedes Business und das schon seit über einer Dekade. Dieser andauernde Trend wird auch Ihr Business-Modell nicht unberührt lassen und hat einiges zu bieten. Es gibt zahlreiche Beispiele, wie und wo eine digitale Transformation...

Startup
Agilität
AWS
Cloud
CI/CD
Softwareentwicklung
Agile Methoden

21.7.2019 | 8 Minuten Lesezeit

Mahdi Ebrahimi

Achim Nierbeck

Serverless done right. Von der Idee zum MVP in 7 Tagen.

Wenn Menschen mit Behinderung mit der Deutschen Bahn reisen möchten und Unterstützung benötigen, müssen sie eine so genannte Hilfeleistung anmelden. Hilfeleistungen sind z. B. der Einsatz von Zugliften für Rollstuhlfahrer oder die Begleitung zum Sitzplatz...

Softwarearchitektur
Cloud
Serverless

9.7.2019 | 9 Minuten Lesezeit

Philipp Maier

Rezepte zum Entwickeln von Webanwendungen mit Node.js, Express.js und ...

Dieser Artikel und die Code-Beispiele bauen auf dem ersten Teil der Artikelserie auf . Hast du ihn bereits gelesen?In diesem Artikel, dem zweiten Teil, geht es um Rezepte, die während des Betriebs und der Weiterentwicklung einer Node.js- und Express....

Node.js
JavaScript

27.5.2019 | 10 Minuten Lesezeit

Jonas Verhoelen

Sharing is caring: Lambda Layers mit Serverless und Node.js

AWS Lambda ist der Serverless Computation Service von AWS . Dieser ermöglicht es, Code Event-getrieben auszuführen. Für jede Funktionalität muss dafür eine Lambda-Funktion mit einem Code-Segment definiert werden, welche durch Events ausgeführt werden...

Cloud
AWS
Serverless

22.5.2019 | 5 Minuten Lesezeit

Fabian Schmauder

Rezepte für Webanwendungen mit Node.js, Express.js und TypeScript

Node.js brachte die Programmiersprache JavaScript aus den Browsern auf die Serverseite und in die Kommandozeilen dieser Welt. Seitdem hat sich das Ökosystem von Node.js zu einer etablierten Wahl für die Softwareentwicklung gemausert. Express.js gilt...

Node.js
JavaScript

7.5.2019 | 12 Minuten Lesezeit

Jonas Verhoelen

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Going serverless: How to move files from on-prem SFTP to AWS S3

Motivation

From SFTP to AWS S3: What you will read about in this post

The architecture

How it works

Node.js and Lambda: Connect to FTP and download files to AWS S3

CloudWatch Event Rule

FTP connection parameters

Limitations and constraints

Run the code with Terraform

Summary: Going serverless by moving files from SFTP to AWS S3

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Testing Spring Batch applications

Tutorial: Move your application to CloudBees

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

CI/CD-Pipelines mit AWS CDK CodePipeline

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Infrastructure as Code in AWS: Keine Silver Bullet

AWS CloudFront Functions testen

Green Cloud: Emissionen unserer Cloud-Architektur messen

Automatisch skaliertes Cloud Native Consent Management in der Google Cloud

Strukturierung von Serverless-Anwendungen in der Cloud

Vom Plastik in die AWS IoT Cloud

Serverless Java mit AWS – Zwei Jahre Cloud-Native

Deno – Einführung & Entwicklung einer einfachen REST API

Fotoverwaltung und Galerien – Teil 2: Statische Fotogalerien

AWS IoT mit Cognito absichern – Ein Schritt-für-Schritt-Guide

AWS lokal entwickeln mit Serverless Framework Offline Plugins

Solution Factory – In 9 Wochen von der Idee zum Produkt

Serverless done right. Von der Idee zum MVP in 7 Tagen.

Rezepte zum Entwickeln von Webanwendungen mit Node.js, Express.js und ...

Sharing is caring: Lambda Layers mit Serverless und Node.js

Rezepte für Webanwendungen mit Node.js, Express.js und TypeScript

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten