Performance optimization of a GraphQL app with Instana

21.7.2020 | 7 minutes of reading time

“Works on my machine.” Okay, but we know quite well software never behaves the same when running on different machines… We knew that, but ran into unexpected performance issues when going live with a simple app. Here’s how we fixed the problem and improved performance.

This is about an existing GraphQL application www.coolboard.fun – a kanban board trello clone app. It ran terribly slow when going live, running into performance issues caused by a rate-limited backend.

After the root cause was found (see post from Robert Hostlowsky ) it’s ready to be optimized and we will see the improvements in the results.

Why did we not notice it earlier while developing? We were focused on delivering features, and when testing we were the only users . But with more users more problems emerged !

With appropriate monitoring we were able to find the bottlenecks caused by simple design flaws quickly.

As described in the previous blog post in detail, it was caused by a flawed design which was not visible while developing but easily found in production with Instana.

In this post we will describe how the load can easily be reduced by 50% and how the performance can greatly be improved.

We will remove a bottleneck in the API-Gateway:

Root cause

As mentioned in the previous post, our Gateway API always fires one additional GraphQL request for user data when the frontend fetches any data. You might already guess that this could somehow be related to authentication, right? And we will see, that is the right direction …

Inefficient Authorisation check

In GraphQL, the resolvers are the “worker” for fetching and providing any piece of data.
Each query field can have its own resolver method.
Each resolver method can implement a gate for checking authorisation: e.g. only an admin can “see” everything.
For each request there is a context which holds any specific info, e.g. all http request headers.
A context can also provide access to global services.

In our application, all the resolvers check JWT OAuth token and if a user with that auth-id exists in the database.

In our first implementation this helper function getUserId() checks the authorization:

const getUserId = async (context) => {

   // 1. verify the authentication token (stored in `context`) and retrieve the authenticationID,
  const authenticationID = await retrieveAuthHeaderToken(context)  // 1

  if (authenticationID) {
  
    // 2. lookup the user with this authenticationID,
    const user = await context.db.query.user({where: { authenticationID }}) // 2
    if (user) {

      // 3. retrieve the user's database Id
      return user.id // 3
    }
  }
  
  throw new AuthError()
}

And this is how we used this helper in our GraphQL resolvers:

// resolvers/query.js

export const resolvers = {

  // get currently logged-in user
  async currentUser(parent, args, context, info) {
  
    const id = await getUserId(context)    // 1
    return context.db.query.user({ where: { id } }, info)
  },

  // get any specific board
  async board(parent, boardId, context, info) {
    await getUserId(context)    // 2
    
    return context.db.query.board({ where: { id: boardId } }, info)
  },

  async cardlist(parent, { where }, context, info) {
    await getUserId(context)    // 2
    
    return context.db.query.list({ where }, info)
  }
}

At first sight, this implementation seems to be correct. It is blocking any non-authenticated access.

“It works”… “Done!”… “Wait?!?”…

Can you spot the mistake?

The currentUser() retrieves the user’s id, and loads the user from the database a second time.
Without any need for the user's id – why do we look up the user in the database? This was not required at all.

Authentication-check improved

We extract the functionality to only verify that the OAuth token from http-header is valid: ensureAuth0TokenValid()

Then we do the user lookup directly in the resolver itself, after extracting the authentication ID (part of OAuth token). The adapted resolvers are now:

// resolvers/query.js

export const Query = {
  async currentUser(parent, args, context, info) {
    // checks token from request header, and extracts oauth-id
    const authenticationID = await retrieveAuthHeaderToken(context)
    return await context.db.query.user({where: { authenticationID }})    
  },
  
  async board(parent, boardId, context, info) {
    // checks token from request header
    await ensureAuth0TokenValid(context)
    return ctx.db.query.board({ where: { id: boardId } }, info)
  },

  async cardlist(parent, { where }, context, info) {
    // checks token from request header
    await ensureAuth0TokenValid(context)    
    return ctx.db.query.list({ where }, info)
  }
}

There is another possible simple optimization because the relation of authentication-id to user-id does not change at all.

We can hold that information in a lookup table, but need to load the info once in the lifecycle of the server – so it is less ideal with serverless lambdas.

// server.js  
const userIdByAuthIdLookup = {};  

export const lookupUserWithAuthId = async (authenticationID) => {
    return userIdByAuthIdLookup[authenticationID] ?? 
        (userIdByAuthIdLookup[authenticationID] = await db.query.user({where: { authenticationID }}))
 }

This simplifies the resolver:

  // ...
  async currentUser(parent, args, context, info) {
    const authenticationID = await retrieveAuthHeaderToken(context)
    return lookupUserWithAuthId(authenticationID)
  }

As a side effect this can also be used to optimize our GraphQL mutations which are using the user’s id, too:

// mutations.js
export const Mutations = {
    async createBoard(parent, { name }, context, info) {
        const authenticationID = await retrieveAuthHeaderToken(context)
        
        const userId = context.lookupUserWithAuthId(authenticationID).id
        
        return ctx.db.mutation.createBoard({ data: { name, createdBy: userId } }, info)
    }
}

Summary:

We replaced the authentication verification with just the OAuth token verification for query operations.
We removed an unneeded database access for retrieving the data of the currently logged-in user.
We are now caching the result of the User lookup.

Verification

Let’s run this simple reproducible scenario:

We will trigger the opening of the board page in the browser 10 times with a 2 seconds delay. This will create “enough load”: the loading will need up to one minute to be finished.

#!/bin/bash 
for i in {1..10} ; do \
   open https://localhost:3000/board/ck5sc7nis74vk0901gvvr42hi ; \
   sleep 2 ; \
done

Then we will wait one minute to get out of the rate-limiting time slot, and repeat it. After repeating this once again, we can ensure to get more solid stats:

openBoardPage_10times
sleep 60
openBoardPage_10times
sleep 60
openBoardPage_10times

This generates 3 sections we will see in the charts in our results below.

Expected result and comparison

After logging in once into the browser, we are staying authenticated for the testing. So, effectively, our frontend opens 3 times 10 board pages.

Our frontend will send these GraphQL requests to the API gateway server:

30 board queries
30 current-user queries
150 (=30*5) cards-list queries.

Our API gateway will send requests to the GraphQL backend

-> Before optimization this lead to 420 calls in summary:

30 user queries + (30 user queries for auth check)
30 board queries + (30 user-queries for auth check)
150 card-list queries + (150 user-queries for auth check)

-> After optimising it results in only 180 calls

We can see the reduced number of calls because all responses arrive in a shorter time see (A).

With fewer requests, the number of waiting requests caused by rate-limiting is smaller, and the latency goes down (B).

At the end, we saved ca. 50% load on our backend, by reducing the number of requests from 420 to 180 !

Latencies

After optimization, the requests sent from the browser have less latency compared with before.

Here, Instana gives us more interesting insights:

The (GraphQL) requests sent from our website to the API-gateway show how the response times for loading the data on the page goes down by factor 3, overall all pages get loaded in a shorter time.

Page load times for GraphQL requests

And finally the improved page load times reflect the improvement:

Here we see the result of our performance optimization: much smaller retrieval times!

Looking forward

The performance improvement shown was just some simple optimization to reduce the bottleneck.
We could try to do more optimization: e.g. why not share the userid to the client and send it together with auth-header? That will at the end only save us one extra lookup for some GraphQL operations, but will make the app unsecure, because sharing internal data. (Don’t do that!)
Finally, the natural performance limit, defined by the current backend, is reached.

Our learnings let us think about alternatives more wisely, for example:

The quick solution: Cut the limit by paying more for Prisma Cloud service.
We could host the Prisma database and run the Prisma server on our own.
Affects also the UI frontend: We could also change the API to allow loading the whole board with only one GraphQL request.

Result, outcome and impact

We found that, once we understood the issue, improving the performance was not difficult. While the problem was hidden in development, it got noticeable in production, caused by a third-party system.

Such small issues can easily slip through while designing and implementation.
We used appropriate monitoring and tooling to easily locate it (part 1 ).
We verified the results or our optimization.

Even though this was only a small project, you can imagine how difficult troubleshooting can get on a bigger system or more distributed systems!

When you are interested in more details and even want to try Instana, you can run a full-featured 14-days trial version .

There is also this post about how to install Instana on a kubernetes cluster (German). You could also request a demo or run a proof of concept together with the APM team .

Was this post helpful?

Likes

Blog author

Maximilian Mayer

Senor IT Consultant

Do you still have questions? Just send me a message.

fromMaximilian Mayer

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

In der Welt der Cloud-Technologie und insbesondere bei AWS (Amazon Web Services) ist die effiziente Verwaltung von Ressourcen von entscheidender Bedeutung, um unnötige Kosten zu vermeiden. Dieser Blogbeitrag konzentriert sich auf AWS S3 und die teuren...

AWS
Cloud

27.11.2023 | 4 Minuten Lesezeit

Lukas Miliunas

Maximilian Mayer

„Eine Plattform ist ein Produkt, die Entwickler-Teams sind die Kunden“

Platform Engineering mit Backstage Im folgenden Interview berichten Marc Schnitzius und Pascal Sochacki von ihren ersten Erfahrungen mit Backstage als Platform-Engineering-Lösung. Marco Paga: Marc, Pascal, ihr habt eine Sicht auf Platform Engineering...

Softwareentwicklung
Accelerate
CI/CD
DevOps
Platform Engineering

2.3.2023 | 12 Minuten Lesezeit

Marco Paga

Maximilian Mayer

„Platform Engineering ist eine Art von Knowledge Sharing“

Warum „Platform Engineering“ eigentlich der falsche Begriff ist und wie man den Golden Path findet, erklärt Daniel Kocot, Senior Solution Architect, im folgenden Interview. Marco Paga: Warum ist Platform Engineering interessant? Daniel Kocot: Ich habe...

Softwareentwicklung
Accelerate
CI/CD
DevOps
Platform Engineering

20.2.2023 | 11 Minuten Lesezeit

Daniel Kocot

Marco Paga

Kubernetes-Monitoring mit Instana (Teil 1)

Einführung: Weshalb Kubernetes und Instana? Cloud- oder cloud-ähnliche Dienste bedienen bekanntermaßen das “As a Service”-Prinzip. Egal ob “Software”, “Function” oder “Platform as a Service”, meist steckt eine containerbasierte Infrastruktur dahinter...

Infrastructure
APM
Kubernetes

13.10.2019 | 6 Minuten Lesezeit

Niko Blättermann

Maximilian Mayer

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Public Cloud im regulierten Sektor: Das ist zu beachten

Es war längere Zeit ein weit verbreitetes und in strategischen Debatten häufig zitiertes Missverständnis, dass die Bundesanstalt für Finanzdienstleistungsaufsicht (BaFin) dem Einsatz von Public-Cloud-Anbietern wie AWS, Azure und Co. einen Riegel vorschiebt...

Cloud
Compliance

10.4.2024 | 6 Minuten Lesezeit

Marc Bialowons

Björn Bohn

OpenAPI direkt in VS Code schreiben – geht das?

OpenAPI-Spezifikationen (OAS) beschreiben standardisiert und Programmiersprachen-unabhängig HTTP-APIs. Für die Erstellung von OAS gibt es verschiedene Möglichkeiten, häufig werden sie auch generiert. Das ist aber nicht für alle Programmiersprachen und...

API
Open Source

28.3.2024 | 7 Minuten Lesezeit

Mirabell Büscher

Green Cloud: Daten und Emissionen sparen

Das Internet produziert jährlich 900 Millionen Tonnen CO₂ – das ist deutlich mehr als Deutschland insgesamt emittiert. Hauptverantwortlich ist der immer weiter steigende Stromverbrauch beim Transport und der Speicherung von Daten. Wenn ihr kurz darüber...

Cloud
Green IT
Softwarearchitektur
Data

11.3.2024 | 5 Minuten Lesezeit

Dennis

Charge your APIs Volume 23: REST vs. gRPC

APIs dienen als Verbindungsstück zwischen Daten und Verarbeitung und erlauben uns damit, Daten im richtigen Kontext als Informationen zu interpretieren. Passende fachliche Themen sind dabei präsenter denn je und erreichen bald auch den Endverbraucher...

Java
Softwareentwicklung
Spring
Softwarearchitektur
API
Data

11.2.2024 | 7 Minuten Lesezeit

Sebastian Tiemann

Datenbanken testen mit Testcontainers in Mule4

Hier erfährst du die Möglichkeiten Testcontainers in Mule4 zu nutzen, um deine Datenbankaufrufe zu testen. Vor einiger Zeit hat mein Kollege Christian Langmann eine Blogartikelserie veröffentlicht, in welcher er aufzeigt, wie man in Mule3 Munit-Tests...

Community
Softwareentwicklung
Testing
API
Open Source
Datenbank
Container
Integration

19.1.2024 | 3 Minuten Lesezeit

Benjamin Lüdicke

AZ-900-Zertifizierung: Mein How-to!

Was ist AZ-900? Azure bietet eine Reihe verschiedener Zertifizierungen an. Zu finden sind sie hier. Darunter befindet sich auch die Zertifizierung AZ-900. Bei diesem Zertifikat handelt es sich um Microsoft Certified: Azure Fundamentals. Diese prüft unter...

Azure
Cloud

2.1.2024 | 5 Minuten Lesezeit

Ege Inanc

Mulesoft Meetup v8 – Loops, Container und Pizza

Bereits zum achten Mal fand sich am 28. November unsere Mulesoft Meetup Community zu einem gemeinsamen Abend in Solingen zusammen. Neben alteingesessenen Mule-Meetup-Enthusiasten konnten wir uns auch dieses Mal wieder über neue Gesichter in unserer Runde...

Community
API
Integration

14.12.2023 | 3 Minuten Lesezeit

Pasquale Brunelli

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

AWS
Cloud

27.11.2023 | 4 Minuten Lesezeit

Lukas Miliunas

Maximilian Mayer

Cloud FinOps

Cloud FinOps bietet einen etablierten Prozess, um Kosten für den Cloudbetrieb zu reduzieren (s. auch diesen Artikel). Zu diesem Zweck bietet es ein etabliertes Cloud-unabhängiges Vorgehen, das eine Organisation schrittweise aufgreifen kann. Das Tooling...

Cloud
Cloud Native
Green IT

26.10.2023 | 5 Minuten Lesezeit

Lukas Miliunas

Marco Paga

Mule Flow Landscape: Abhängigkeiten zwischen Mule Flows sichtbar machen

Erfahre, wie du mit dem Tool Mule Flow Landscape den Überblick über alle Mule Flows und deren Abhängigkeiten behältst. Die Integrationsplattform Mule ermöglicht es uns, Integrationen mittels einer Low-Code-Entwicklungsplattform umzusetzen. Die Bausteine...

Softwareentwicklung
API
Open Source
Dokumentation
Integration

13.8.2023 | 3 Minuten Lesezeit

Benjamin Lüdicke

Mehr Struktur in der Cloud mit Azure Landing Zones

Die Migration in die Cloud bringt einige Herausforderungen mit sich. Viele Unternehmen stehen vor der Frage, wie ein effizienter und sicherer Aufbau einer skalierbaren Cloud-Infrastruktur umzusetzen ist. Die Antwort auf diese Herausforderung liegt in...

Cloud
Azure
IT-Governance

4.8.2023 | 4 Minuten Lesezeit

Florian Moll

Nils Bauroth

CI/CD-Pipelines mit AWS CDK CodePipeline

Das Aufsetzen der CI/CD-Pipeline ist ein typischer Task in der Anfangszeit eines Projekts. Ist die Pipeline dann aufgesetzt, sind Änderungen nur noch selten notwendig. Dementsprechend wenig Routine entwickeln Programmierende im Umgang mit der Konfiguration...

Cloud
CI/CD
AWS

17.7.2023 | 4 Minuten Lesezeit

Dennis

Green Cloud: Nachhaltig skalieren

Wenn Softwareprojekte in die Cloud gebracht werden, versprechen wir uns davon hohe Verfügbarkeit, planbare Kosten und eine immer dem Bedarf entsprechende Skalierung. Aufgrund der grenzenlosen Angebote ist es aber auch leicht, die Komponenten eines Systems...

Cloud
Softwarearchitektur
Green IT

12.6.2023 | 5 Minuten Lesezeit

Dennis

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Crossplane ist ein plattformübergreifendes Kontrollsystem (Control-Plane), das das Management von Cloud-Ressourcen vereinfachen und automatisieren soll. Das Tool ermöglicht es, verschiedene Cloud-Provider und lokale Ressourcen, z. B. Kubernetes-Cluster...

Cloud
Cloud Native

12.5.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Green Cloud: Ideen für eine nachhaltigere Architektur

Die ökologische Nachhaltigkeit eines Systems ist aktuell häufig noch kein Thema. Nachhaltigkeit bedeutet für mich in diesem Kontext die Reduktion der verursachten Emissionen durch gesenkten Ressourcenverbrauch – egal ob die Emissionen beim Cloudprovider...

Cloud
Softwarearchitektur
Green IT

5.5.2023 | 5 Minuten Lesezeit

Dennis

Charts im Browser – Eine Einführung in AG Grid (Teil 2)

Nachdem wir in Teil 1 unserer kleinen Reihe zum AG-Grid-Framework gezeigt haben, wie man damit schnell interaktive Tabellen erstellt, geht es in diesem Beitrag darum, wie man die gleichen Daten auch in Grafiken (wie Balkendiagramme, Pie Charts oder Zeitserien...

React
Frontend
JavaScript
Framework
Softwareentwicklung

2.5.2023 | 6 Minuten Lesezeit

Daniel Töws

Selvarajah Sivarupan

Astro – Mit der Insellösung zur Lichtgeschwindigkeit

Astro stellt sich als „All-in-one Web Framework“ vor, das „designed for speed“ ist. Große Versprechen wie „Pull your content from anywhere“, „Deploy everywhere“ und „Use whatever frontend library you want“ prangen offensiv auf der Startseite. Eine eierlegende...

Frontend
JavaScript
Webdevelopment
Framework
Softwareentwicklung

14.4.2023 | 4 Minuten Lesezeit

Stephan Köninger

Experience: Jetzt auch für APIs

APIs spielen eine zentrale Rolle bei der Digitalisierung. Extern angeboten, ermöglichen sie das Erschaffen von Ökosystemen und neuen Geschäftsmodellen. Unternehmen wollen gerne selbst als Plattform gesehen werden, auch hier sind APIs unerlässlich. Intern...

5.4.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

Modernes Data Fetching mit Redux Toolkit Query

Das vor sieben Jahren erstmals veröffentlichte Redux wurde bereits vor vier Jahren mit Redux Toolkit (RTK) modernisiert. Im Juni 2021 erreichte Redux dann die nächste Evolutionsstufe, indem mit Redux Toolkit Query eine dedizierte Data-Fetching-Lösung...

React
JavaScript
Frontend

28.2.2023 | 10 Minuten Lesezeit

Christoph Butschkau

Björn Böing

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Performance optimization of a GraphQL app with Instana

Root cause

Inefficient Authorisation check

Authentication-check improved

Summary:

Verification

Expected result and comparison

Latencies

Page load times for GraphQL requests

Looking forward

Result, outcome and impact

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

„Eine Plattform ist ein Produkt, die Entwickler-Teams sind die Kunden“

„Platform Engineering ist eine Art von Knowledge Sharing“

Kubernetes-Monitoring mit Instana (Teil 1)

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Public Cloud im regulierten Sektor: Das ist zu beachten

OpenAPI direkt in VS Code schreiben – geht das?

Green Cloud: Daten und Emissionen sparen

Charge your APIs Volume 23: REST vs. gRPC

Datenbanken testen mit Testcontainers in Mule4

AZ-900-Zertifizierung: Mein How-to!

Mulesoft Meetup v8 – Loops, Container und Pizza

Mit FinOps die größten Kostenfallen bei AWS S3 verhindern

Cloud FinOps

Mule Flow Landscape: Abhängigkeiten zwischen Mule Flows sichtbar machen

Mehr Struktur in der Cloud mit Azure Landing Zones

CI/CD-Pipelines mit AWS CDK CodePipeline

Green Cloud: Nachhaltig skalieren

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Green Cloud: Ideen für eine nachhaltigere Architektur

Charts im Browser – Eine Einführung in AG Grid (Teil 2)

Astro – Mit der Insellösung zur Lichtgeschwindigkeit

Experience: Jetzt auch für APIs

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Modernes Data Fetching mit Redux Toolkit Query

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten