Performance Analysis of a GraphQL application with Instana

6.3.2020 | 9 minutes of reading time

Modern IT landscapes typically consist of a bunch of different microservices. Replacing the monoliths brings us more complexity due to more parts and all their dependencies.

A key aspect for running these systems is the appropriate monitoring with the ability to handle this complexity and to observe system performance. It also needs to understand all different communication forms like REST, gRPC, GraphQL, etc.

In this post we will analyse the performance issues of an existing application. In a follow-up blog post we will then present the solution and the resulting performance improvement.

A demo application (www.coolboard.fun ) for a GraphQL online course was quickly implemented, set up and running, but ran into performance issues…

With more and more users, the performance went down faster than expected, resulting in:

Page load time > 1 second: even a page took up to 15 seconds!
Failing end-to-end browser tests after running into timeouts!

While I was developing the app, I never ran into such issues, so I started wondering

What was the main bottleneck? Spoiler alert: there is a subtle side effect caused by rate-limiting.
Is there any easy way to fix most (the typical 80%) performance issues quickly?
Is there at least some low-hanging fruits, just because sometimes I tend to be lazy?
And important in the long-term: can we get insights to make the right decision for changing architecture and building blocks later?

Before searching the root issue, we will need to understand the overall structure of the application, and the building blocks of different (micro)services, their dependencies and how they communicate.

High-level architecture and services

Web (SPA) -> API Server(BFF, Auth) -> Prisma Server(GraphQL – ORM mapping) -> DB

The Single-page application (SPA) is running in the browser and connects to Auth0.com for authentication and accesses the API Server which provides a specific GraphQL API interface and does authentication handling (aka: backend-for-frontend). It can even be scaled up easily because it does not do session handling there. The authentication is only done by exchanging JWT auth tokens.
The user management and authentication is done via the separate third-party service, Auth0.com .

The Prisma Server is an ORM and it provides all usual CRUD operations via GraphQL operations.

Observation

When I open one board page with only a small number of lists of cards (aka: lanes), then the page loads fast.

But when more pages are opened simultaneously or when there is more load, the performance drops and the page seems to be loading slow-ish!

At least the board’s title with its list names seems to appear quickly because they get loaded first.

Then, every list of cards gets loaded and the lanes filled with its cards. Under some load the response times can increase to more than 15 seconds – catastrophical!

When you are wondering why the application reacts in this way, we need to mention that

the purpose of this application was demonstration of the use of GraphQL,
developing and testing the application was done on a local dev machine within a docker environment,
there was neither much load nor extra load testing until the app going live, so it was not noticed before, and
finally, the deployment to a cloud service was not obviously leading to such a bad performance.

Now, I have some suspicions because I am using the free but limited version of Prisma cloud: The communication between my API-gateway and the Prisma cloud server is somehow throttled and limited (more details later).

But let’s start figuring out how the services communicate with each other by the help of some tools.

Analysis – Apollo Graph Manager

The easiest way to get some metrics was by activating the built-in tracing-feature for sending query metrics in the Apollo-server: After creating an account and api-key on Apollo Graph Manager at https://engine.apollographql.com we can activate tracing in the API-gateway. Furthermore, we just need to turn it on by wrapping the server in our API-gateway:

Every request from the Prisma cloud backend by our API-gateway gets logged – after removing any parameters values.

This will give us some insight on the communication between the website in the browser to the API-gateway.Even while the free version has timely limited logging of only the last 24 hours, it already shows us that we run more than 200 queries, while opening the board page 29 times:

The response-time of the CardList query is distributed between 400 milliseconds and 14 seconds!

Finding 1: There are too many GraphQL requests triggered

One root cause may be the limitation or throttling of our free GraphCool/Prisma cloud server:

Until now, we only get the metrics for GraphQL-requests sent from the browser to the API-gateway.

We will need to dive deeper now. We need to inspect the communication between the API-gateway and Prisma cloud GraphQL server, in order to understand which queries are slow or where the bottleneck is.

Analysis – APM with tracing

At this point I was looking for an application monitoring tool which is capable of understanding GraphQL. That means which is able to understand and differentiate the GraphQL queries which are all sent as usual POST requests to the same endpoint (e.g. /graphql)

I checked well-known tools on the market, but actually there was only InstanaTM capable of tracing this GraphQL protocol communication.
Instana™️ also provides end-user-monitoring (EUM) together with tracing the communication of microservices down to database operations, which is an ideal tool for our use case.

We will use Instana™️ running as a SaaS version. Additionally, we will need to run the Instana™️ agent in the same environment as our services. The agent sends the recorded monitoring data to the Instana™️ backend.
For this demo I can also start it in a local docker environment and start the API-gateway there.
Compared to the production environment we will get different timings, but that is okay, as we just want to focus on the communication flow for now.

The setup and high-level architecture for our further analysis:

Let’s start with enabling end-user-monitoring:
We will need to define a website in Instana™️, and add this snippet into webpage similar to embedding e.g. Google Analytics:

Everything will work automatically out of the box, we only need to add setting the name of the page, via injecting this javascript call at the end of the webpage:
ineum('page', 'main-page')

In order to get full tracing and monitoring in our API-gateway running on Node.js, we only need to run these lines before anything else. This activates code injection, so all requests and responses will get traced automatically!

Let’s start from the user’s perspective:

Instana™️ provides a “website view” where we can see how our boards page with all its resources gets loaded. After filtering for XHR / Post requests, we already see the necessary requests for boards data:

One request, getting board’s name and its lanes’ titles only
Some extra request for each lane (=card list)

Although this looks pretty fine (load time below 1 second), the performance gets worse when more users load the board page. We can see that after clicking that button to open the Analytics page to show the backend traces.

We can see all specific XHR requests to the API-gateway (at localhost:4000) with the different, varying response times (in the last column):

We need to dive deeper into one of these traces to figure out how it communicates to the Prisma cloud backend.
First, when filtering for all calls, we can see that varying response times in the right column again.

That already gives some indication for our issues!

Then, let’s see what happens in the background by selecting one call. This gives some information and shed some light on the communication of API-gateway and the Prisma cloud backend:

Traces for the browser requesting the initial board metadata:

We find two sequential requests to the Prisma backend, called by the API-gateway one after the another:

First, it is requesting some user information.
In the second request it retrieves the board data from the backend. (In the image above it is selected, so we see the query details on the right side)

Traces for the browser requesting one lane (=card list) with its cards:

Here, we also find an extra request – for some user data – (see the details on the right side)!

Finally, even while there are only 6 GraphQL requests by the frontend, we will end up in more than 12 backend calls to Prisma backend!

Finding 2: There are unneeded extra requests by the API-gateway

In the analysis above, we found out that the API-gateway is requesting some unneeded and unexpected extra user data from the database backend (at eu1.prisma.sh), doubling the number of requests.

Quickly running into the rate-limiting causes the varying latency…

How can we solve this?

We quickly found a performance bottleneck and what is causing that problem: The main goal will be to reduce the overall number of GraphQL requests at the backend.

Obviously, even while the architecture and service structure were fully sufficient for a little demo, the best solution is to migrate to a less limited GraphQL persistence service (e.g. FaunaDB) or hosting a Prisma backend service on our own.

To fix the performance issues, we could even add caching in the API-gateway or collapse all GraphQL queries into one huge GraphQL query, but this means adapting the application.
Low-hanging fruits: As a quick measure we should get rid of fetching extra user information in each request by adapting our API-gateway server!

Conclusion

In order to find performance issues it is necessary to have the right tools: not only to monitor performance but also to analyse it easily and find problems quickly.
The Apollo Engine helped to get some quick statistics first, but will be limited to GraphQL specific operations only.
Additionally with Instana™️ we get a bigger detailed picture and we can also find the bottleneck in the communication – via GraphQL and other protocols – of the the whole system.

For this post we used Instana™️ for the detection of the performance issues with only a limited view of only a part of the system. You can imagine how effective this can be when used within the whole production system, monitoring all parts of the whole system, and when you also can use its advanced alerting features!

When you are interested in more details and even want to try Instana™️, you can run a full-featured 14-days trial version. There is also this post about how to install instana on a kubernetes cluster (German).
You could also request a demo or run a PoC together with the APM team .

As we now have an idea what the root problem is, we can improve the performance by reducing the load on backend by 50% with only little effort. Check out part two: Performance Optimization of a GraphQL app with Instana .

Was this post helpful?

Likes

Blog author

Robert Hostlowsky

Do you still have questions? Just send me a message.

fromRobert Hostlowsky

GraphQL mit Spotify – Teil 2: React Native Client mit Apollo

GraphQL im Client: Introduction Sobald wir mit einen GraphQL-Client, hier mit Apollo, Anfragen an unserem GraphQL-Server aus dem 1. Teil schicken, werden die Vorteile von GraphQL noch besser sichtbar. Wir werden sehen, wie leicht GraphQL im Client hinzugefügt...

Softwarearchitektur
Community
API
JavaScript
React

18.4.2018 | 7 Minuten Lesezeit

Robert Hostlowsky

Let’s build a Spotify GraphQL Server – Part 1

Update 1: Hint about blocked non-authenticated REST calls by Spotify. The demo and source code on github are already adapted, more details in a follow-up blog.improved code syntax-highlightingGitHub built a GraphQL API server. You can write your own...

API
JavaScript
Node.js

20.9.2017 | 11 Minuten Lesezeit

Robert Hostlowsky

Jenkins for Enterprise

Zusammen mit mehreren Kollegen habe ich in einem Training über die Jenkins Enterprise bei Cloudbees sehr interessante Features kennengelernt, die ich mir in den letzten Jahren bei der Arbeit mit Jenkins schon oft gewünscht habe! Gegenüber der Community...

DevOps
CI/CD

24.1.2014 | 4 Minuten Lesezeit

Robert Hostlowsky

SoCraTes2013

The SoCraTes2013, Software Craftsmanship and Testing #3 , took place on 1st to 4th August in Seminarzentrum Rückersbach nearby Aschaffenburg (@socrates_2013) This year codecentric sponsored this conference, because we see craftmanship as one of our ...

Software development
Testing

8.8.2013 | 2 Minuten Lesezeit

Robert Hostlowsky

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

OpenAPI direkt in VS Code schreiben – geht das?

OpenAPI-Spezifikationen (OAS) beschreiben standardisiert und Programmiersprachen-unabhängig HTTP-APIs. Für die Erstellung von OAS gibt es verschiedene Möglichkeiten, häufig werden sie auch generiert. Das ist aber nicht für alle Programmiersprachen und...

API
Open Source

28.3.2024 | 7 Minuten Lesezeit

Mirabell Büscher

Charge your APIs Volume 23: REST vs. gRPC

APIs dienen als Verbindungsstück zwischen Daten und Verarbeitung und erlauben uns damit, Daten im richtigen Kontext als Informationen zu interpretieren. Passende fachliche Themen sind dabei präsenter denn je und erreichen bald auch den Endverbraucher...

Java
Softwareentwicklung
Spring
Softwarearchitektur
API
Data

11.2.2024 | 7 Minuten Lesezeit

Sebastian Tiemann

Datenbanken testen mit Testcontainers in Mule4

Hier erfährst du die Möglichkeiten Testcontainers in Mule4 zu nutzen, um deine Datenbankaufrufe zu testen. Vor einiger Zeit hat mein Kollege Christian Langmann eine Blogartikelserie veröffentlicht, in welcher er aufzeigt, wie man in Mule3 Munit-Tests...

Community
Softwareentwicklung
Testing
API
Open Source
Datenbank
Container
Integration

19.1.2024 | 3 Minuten Lesezeit

Benjamin Lüdicke

Mulesoft Meetup v8 – Loops, Container und Pizza

Bereits zum achten Mal fand sich am 28. November unsere Mulesoft Meetup Community zu einem gemeinsamen Abend in Solingen zusammen. Neben alteingesessenen Mule-Meetup-Enthusiasten konnten wir uns auch dieses Mal wieder über neue Gesichter in unserer Runde...

Community
API
Integration

14.12.2023 | 3 Minuten Lesezeit

Pasquale Brunelli

Mule Flow Landscape: Abhängigkeiten zwischen Mule Flows sichtbar machen

Erfahre, wie du mit dem Tool Mule Flow Landscape den Überblick über alle Mule Flows und deren Abhängigkeiten behältst. Die Integrationsplattform Mule ermöglicht es uns, Integrationen mittels einer Low-Code-Entwicklungsplattform umzusetzen. Die Bausteine...

Softwareentwicklung
API
Open Source
Dokumentation
Integration

13.8.2023 | 3 Minuten Lesezeit

Benjamin Lüdicke

Microservice Integration Testing done right

In diesem Artikel beschreiben wir gesammelte Best Practices für das Integration Testing von Microservices. Zu diesem Zweck haben wir ein Projekt namens toti-example-service erstellt und auf GitHub veröffentlicht. Wir werden uns in diesem Beitrag immer...

Testing
Microservices
Spring
Kotlin

11.4.2023 | 7 Minuten Lesezeit

Tobias Dittrich

Till Voß

Experience: Jetzt auch für APIs

APIs spielen eine zentrale Rolle bei der Digitalisierung. Extern angeboten, ermöglichen sie das Erschaffen von Ökosystemen und neuen Geschäftsmodellen. Unternehmen wollen gerne selbst als Plattform gesehen werden, auch hier sind APIs unerlässlich. Intern...

5.4.2023 | 2 Minuten Lesezeit

Matthias Niehoff

„Eine Plattform ist ein Produkt, die Entwickler-Teams sind die Kunden“

Platform Engineering mit BackstageIm folgenden Interview berichten Marc Schnitzius und Pascal Sochacki von ihren ersten Erfahrungen mit Backstage als Platform-Engineering-Lösung.Marco Paga: Marc, Pascal, ihr habt eine Sicht auf Platform Engineering, ...

Softwareentwicklung
Accelerate
CI/CD
DevOps
Platform Engineering

2.3.2023 | 12 Minuten Lesezeit

Marco Paga

Maximilian Mayer

Warum schlechte APIs teuer sind

Schlechte APIs? Gute APIs? Ist diese Unterteilung überhaupt sinnvoll? Ich glaube, wir müssen mal reden. Es war einmal ... eine „schlechte“ API Eine API ist bekanntlich eine Art von Schnittstelle, ausgeschrieben ein sog. „Application Programmable Interface...

24.2.2023 | 5 Minuten Lesezeit

Sebastian Tiemann

„Platform Engineering ist eine Art von Knowledge Sharing“

Warum „Platform Engineering“ eigentlich der falsche Begriff ist und wie man den Golden Path findet, erklärt Daniel Kocot, Senior Solution Architect, im folgenden Interview.Marco Paga: Warum ist Platform Engineering interessant?Daniel Kocot: Ich habe ...

Softwareentwicklung
Accelerate
CI/CD
DevOps
Platform Engineering

20.2.2023 | 11 Minuten Lesezeit

Daniel Kocot

Marco Paga

Mule 4: Test-Parametrisierung – ein Flow für viele Fälle

Immer wieder entdecke ich bei Code-Reviews, dass für verschiedene Testfälle, die sich prinzipiell nur durch die Ein- und Ausgabedaten unterscheiden, eine Vielzahl von MUnit-Tests angelegt werden. Diese Flows werden dann mühselig kopiert, um jeden Testfall...

Integration
API
Testing

16.2.2023 | 5 Minuten Lesezeit

Pasquale Brunelli

Platform Engineering – Machen das nicht alle schon?

Plattformen sind aktuell ein sehr populäres Konzept, insbesondere in der Softwareentwicklung von Unternehmen. Viele sagen aber auch: So neu ist das doch gar nicht. Wir bieten unseren Entwicklern seit Jahren alle relevanten Tools und Werkzeuge, damit ...

DevOps
Accelerate

7.12.2022 | 2 Minuten Lesezeit

Matthias Niehoff

MuleSoft DataWeave Libraries – Teilen leicht gemacht

Mit Anypoint Exchange bietet Mulesoft bereits seit langer Zeit die Möglichkeit, verschiedene Assets wie zum Beispiel Konnektoren oder APIs mit anderen zu teilen.Was bisher gefehlt hat, war jedoch die Möglichkeit, DataWeave-Code bereitstellen zu können...

Softwareentwicklung
API
Integration

3.10.2022 | 7 Minuten Lesezeit

Pasquale Brunelli

Platform Engineering – Eine Einordnung

Aktuell kocht mit Platform Engineering gerade ein Thema hoch, das in den Weiten des World Wide Web für viele Reaktionen sorgt. Gerade auch Kunden aus dem Enterprise-Umfeld führt es zu interessanten Nebeneffekten, wenn aus DevOps-Teams plötzlich Platform...

Accelerate
CI/CD
DevOps

12.9.2022 | 4 Minuten Lesezeit

Daniel Kocot

Hotwire: Ein neuer (alter) Ansatz für moderne Webanwendungen

Hotwire (HTML over the wire) wurde Ende 2020 von Basecamp vorgestellt und verspricht einen alternativen Ansatz zur Entwicklung moderner Webanwendungen mit weniger JavaScript:Hotwire is an alternative approach to building modern web applications without...

Frontend
Softwarearchitektur
Microservices
JavaScript
Webdevelopment

24.8.2022 | 9 Minuten Lesezeit

Felix Rieß

„Strawberry JSON Fields Forever“: Filtern nach JSON-Feldern mit GraphQL...

Schon die Beatles besangen ein uraltes Problem in ihrem Song „Strawberry JSON Fields Forever“ : Wie lässt sich mit der GraphQL Library Strawberry für Python nach Werten in JSON-Feldern einer PostgreSQL-Datenbank filtern?SetupUm das zu zeigen, braucht...

Frontend
API
Python

26.6.2022 | 4 Minuten Lesezeit

Michael Eichenseer

Ein Microservice mit Kotlin und Ktor – ohne Spring

Ktor (s. https://ktor.io/ ) ist ein Framework für Kotlin, das sowohl Client- als auch Serverfunktionen bereitstellt und sich vorrangig der Kotlin DSL anstelle von Annotations bedient.Vor einiger Zeit (2018 war doch erst gestern?…) hat sich Lovis dieses...

Kotlin
Microservices

14.6.2022 | 4 Minuten Lesezeit

Sebastian Tiemann

Passwörter sicher per GitOps deployen mit SealedSecrets

In einem GitOps-Workflow beschreibt das Entwicklungsteam alle Ressourcen eines Kubernetes-Projekts in einem Git-Repository. Dadurch können sowohl das Entwicklungsteam als auch das Infrastrukturteam alle Bestandteile eines Projektes überblicken. Was jedoch...

DevOps
Kubernetes

13.6.2022 | 10 Minuten Lesezeit

Raffael Stein

Terraform Remote State richtig nutzen

Was ist Terraform und was ist State?Terraform ist ein Tool für die Verwaltung von Infrastruktur in Form von Code, gehört also in den sogenannten Infrastructure-as-Code-Bereich (IaC). Eine kurze Einführung und ein Vergleich zu anderen Tools findet sich...

Infrastructure
Softwarearchitektur
Cloud
DevOps

21.4.2022 | 7 Minuten Lesezeit

Alexander Kasper

Miro ohne Grenzen – Wie man eigene Plugins für Miro entwickelt

In den letzten zwei Jahren haben sich viele der Aktivitäten in der Business-Welt zu Remote-Aktivitäten verändert. Für viele von uns sind dadurch neue Tools in den Fokus gerückt.Aber auch wenn diese Werkzeuge sich enorm weiterentwickelt haben, irgendwann...

API
React
JavaScript

2.3.2022 | 8 Minuten Lesezeit

Stefan Spittank

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Performance Analysis of a GraphQL application with Instana

High-level architecture and services

Observation

Analysis – Apollo Graph Manager

Finding 1: There are too many GraphQL requests triggered

Analysis – APM with tracing

The setup and high-level architecture for our further analysis:

Traces for the browser requesting the initial board metadata:

Traces for the browser requesting one lane (=card list) with its cards:

Finding 2: There are unneeded extra requests by the API-gateway

How can we solve this?

Conclusion

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

GraphQL mit Spotify – Teil 2: React Native Client mit Apollo

Let’s build a Spotify GraphQL Server – Part 1

Jenkins for Enterprise

SoCraTes2013

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

OpenAPI direkt in VS Code schreiben – geht das?

Charge your APIs Volume 23: REST vs. gRPC

Datenbanken testen mit Testcontainers in Mule4

Mulesoft Meetup v8 – Loops, Container und Pizza

Mule Flow Landscape: Abhängigkeiten zwischen Mule Flows sichtbar machen

Microservice Integration Testing done right

Experience: Jetzt auch für APIs

„Eine Plattform ist ein Produkt, die Entwickler-Teams sind die Kunden“

Warum schlechte APIs teuer sind

„Platform Engineering ist eine Art von Knowledge Sharing“

Mule 4: Test-Parametrisierung – ein Flow für viele Fälle

Platform Engineering – Machen das nicht alle schon?

MuleSoft DataWeave Libraries – Teilen leicht gemacht

Platform Engineering – Eine Einordnung

Hotwire: Ein neuer (alter) Ansatz für moderne Webanwendungen

„Strawberry JSON Fields Forever“: Filtern nach JSON-Feldern mit GraphQL...

Ein Microservice mit Kotlin und Ktor – ohne Spring

Passwörter sicher per GitOps deployen mit SealedSecrets

Terraform Remote State richtig nutzen

Miro ohne Grenzen – Wie man eigene Plugins für Miro entwickelt

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten