How to secure a GraphQL service using persisted queries

No Comments

GraphQL is a rising query language that gives clients the power to ask for what they need and get exactly that in a single request. In theory this leads to effective and flexible client-server communication. But adopting new technology always comes with new challenges. One challenge we recently dealt with is to limit the exposed data of an existing GraphQL server. In this tutorial you will learn how persisted queries can help to improve the security and performance of your application without reducing the great developer experience of GraphQL and its toolings.

What are persisted queries and what makes them secure?

Persisted GraphQL queries modify the communication between a GraphQL client and its server. Instead of a whole query, the client only sends a hash of it to the server. The server has a list of known hashes and uses the related query. This improves the performance as well as the security, as the server only responds to a limited list of queries.

Why do we need to make our queries more secure?

Persisting queries reduces the exposed interface and data to the ones your application really needs. Imagine using a traditional CMS like Drupal via its GraphQL interface. Do you really want all data to be accessible to the public? Including the email addresses of your author accounts? Drupal might provide a way to secure that data at some point in time. But in times of new security vulnerabilities popping up consistently, I would rather move the whole Drupal service into a secured network than patching the next security update in the night it gets released.

What are the limits of this approach?

If you have a public GraphQL API, which is used by multiple clients outside of your responsibility, this is not for you. The whole thing about securing your application this way is that your server already knows which queries it can expect.

Using persisted queries will also not limit the exposed data based on the current user. This is still the responsibility of the server. However, there are ways of extending the solution we are going to build to achieve this. Is this worth another blog post? Feel free to leave a comment below and let me know what problems you currently deal with.

Why another tutorial?

Searching for persisted queries, there are a bunch of tutorials out there. Why do we need a new one? A common approach is to use the persistgraphql library which is archived and hasn’t been updated since 2018. The official approach as of now is to use automatic persisted queries which optimizes the query performance on the fly. But there are no security benefits when automatically hashing every query that pops up on the server. Reducing the allowed queries is now called safe listing and part of the paid Apollo Platform plans. So no more simple way of securing my queries? Hold on.

Let’s get started implementing persisted queries

The idea is slightly different from what is out there using libraries like persistgraphql. But it is pretty straight forward and this image shows what we will do.
Schema of persisted GraphQL queries

Instead of talking directly to the GraphQL service from the client, we will introduce a server in between. This server will handle all GraphQL requests, but only allow known queries. This works as the client and server share the same query sources and will create hashes importing them. Creating the hashes directly from the source code which is shipped with the server ensures that only intended queries get a valid hash. These hashes are used for public communication between client and server. The server knows the correct query for a valid hash and sends it to the GraphQL service. This GraphQL service can now safely limit its access to requests from the new server.

We will start with the client-side by creating a hash and sending it instead of the whole query. Then we will switch over to the server and handle this sent hash. We will benefit from using JavaScript in the client and server as we can use the same approach to create the hashes.

If you want to follow along or jump right into the source code, feel free to check out this repository with the complete example implementation.

Used libraries and tools

We will not reinvent the wheel and use existing open source libraries, if possible.

Create and send the hash in the client

We need to create a unique hash for every query we are going to send. Therefore we use the Apollo link for persisted queries which does exactly that out of the box:

import { ApolloClient } from "apollo-client"
import { createHttpLink } from "apollo-link-http"
import { InMemoryCache } from "apollo-cache-inmemory"
import { ApolloLink } from "apollo-link"
import { createPersistedQueryLink } from "apollo-link-persisted-queries"
 
const httpLink = createHttpLink({
  uri: "/graphql",
})
const automaticPersistedQueryLink = createPersistedQueryLink()
const apolloClient = new ApolloClient({
  link: ApolloLink.from([automaticPersistedQueryLink, httpLink]),
  cache: new InMemoryCache(),
})

Instead of sending the full query, Apollo will now only send its hash in the post body:

{
  "extensions":{
    "persistedQuery":{
      "version":1,
      "sha256Hash":"fcf31818e50ac3e818ca4bdbc433d6ab73176f0b9d5f9d5ad17e200cdab6fba4"
    }
  }
}

Note: Now you can also switch to get requests, this will help if you want to cache your query results (see Apollo link options).

One drawback of using the stated Apollo link is that it is done primarily for performance and creating hashes on the fly. A side effect of this is that variables are also part of the hash to provide a unique response for every hash. This will not work for us as we do not always know all the dynamic variables at build time. Therefore we need to modify the hash function to include only static sources. We make use of a webpack loader called graphql-persisted-document-loader which will automatically create a hash for every imported query source.

Make sure to use the graphql-persisted-document-loader before the graphql-tag/loader as we need to create the hash after all required fragments of the query have resolved. (Yes, webpack applies used loaders in reverse order from right to left…)

module.exports = {
  module: {
    rules: [
      {
        test: /\.graphql$/,
        exclude: /node_modules/,
        use: ["graphql-persisted-document-loader", "graphql-tag/loader"],
      },
    ],
  },
}

Now we need to tell our Apollo link to use the created hash, which is stored in the query as documentId.

const automaticPersistedQueryLink = createPersistedQueryLink({
  generateHash: ({ documentId }) => documentId,
})

That’s it from the client. Now every sent GraphQL request contains a hash instead of the query.

Note: One side effect of using the Apollo link for persisted queries is that the client automatically resends the request with the query if the server was not able to respond to the hash. You can also create your own link to avoid this, but it might not be a big deal since the second request will just fail as well when queries are not handled by the server. Indeed, you could use this to create a fallback mechanism when switching an existing application to persisted queries. This way we made sure that everything works as expected before blocking requests with queries.

Resolve the hash to its query on the server

Instead of the full query, our GraphQL server now only receives a hash of it. So it needs to know which query belongs to that hash. The trick to get the exact same hash as the client is to use the same mechanism. We will load the same queries using the same webpack loaders.

There are several ways of implementing this concept for the server. You could create a middleware for your GraphQL service, add the functionality to your existing backend, introduce a new microservice for this or extend a server which already combines several GraphQL services using Apollo Federation. We chose the microservice approach, as we work with a legacy GraphQL service which we want to hide completely from the public. That’s the setting for the server part of this tutorial. However, the concept is the same for the other approaches.

The new service uses the same GraphQL loader as the client above. Only how we import the queries is slightly different, as we need to load all possible queries to create a map from hash to its query. This is the code we run when starting the server to import them all. We use a monorepo to import the queries straight from the client folder.

const queries = require.context("../../client/queries", false, /\.graphql$/)
 
let resolvedQueries = []
queries.keys().forEach((key) => resolvedQueries.push(queries(key)))
 
module.exports = resolvedQueries

The next step is to create the endpoint which uses these queries. We will create a small express service for that.

const express = require("express")
const app = express()
const GraphqlRequestHandler = require("./graphqlRequestHandler")
const queries = require("./loadQueries")
 
app.use(express.json())
 
app.post("/graphql", new GraphqlRequestHandler(queries))
 
app.listen(8082, function () {
  console.log("Example app listening on port 8082!")
})

The queries are loaded once on server startup and passed to the handler. The constructor of this handler will create a map to get the query for every hash and return the actual handler.

module.exports = class GraphqlRequestHandler {
  constructor(queries) {
    this.hashToQueryMap = {}
    queries.forEach((query) => {
      this.hashToQueryMap[query.documentId] = query
    })
    return (req, res, next) => {
      this.handle(req, res, next)
    }
  }
  // ...
}

For every incoming request to this endpoint, the handler will look into the hashToQueryMap and get the corresponding query and its variables. Then he will send this request to the GraphQL service and handle error cases.

const apolloClient = require("./apolloClient")
 
module.exports = class GraphqlRequestHandler {
  // ...
  async handle(req, res, next) {
    const query = this.getQueryForHash(req)
    const variables = req.body.variables
 
    if (query) {
      try {
        console.log("sending graphql query")
        const response = await apolloClient.query({ query, variables })
        console.log("returning graphql response")
        res.send(response)
      } catch (error) {
        console.log("error while sending graphql query")
        next(error)
      }
    } else {
      console.log("no matching query for hash found")
      res.status(400).send()
    }
  }
  // ...
}

Getting the query for the current request could look like the following.

module.exports = class GraphqlRequestHandler {
  // ...
  getQueryForHash(req) {
    const persistedQueryHash =
      req.body.extensions &&
      req.body.extensions.persistedQuery &&
      req.body.extensions.persistedQuery.sha256Hash
    if (persistedQueryHash) {
      console.log("search query for provided hash " + persistedQueryHash)
      return this.hashToQueryMap[persistedQueryHash]
    } else {
      console.log("no hash provided")
      return undefined
    }
  }
  // ...
}

And last but not least, creating the Apollo client to send the actual query to the GraphQL service is shown below. We could also send the query using a simple post request. But using the Apollo client helps to get the same result as expected from the client.

const { InMemoryCache } = require("apollo-cache-inmemory")
const { ApolloClient } = require("apollo-client")
const { createHttpLink } = require("apollo-link-http")
const fetch = require("cross-fetch")
 
const httpLink = createHttpLink({
  uri: "http://graphql.service.url/",
  fetch,
})
module.exports = new ApolloClient({
  link: httpLink,
  cache: new InMemoryCache(),
})

Additional thoughts

That’s it for now. Are you already using persisted queries? What is holding you back? Here are some things that came to my mind while working on this blog post.

Take care of your deployment

As the query sources are shared between the client and server, it is crucial to deploy updates simultaneously. Meaning, if you change or add a query, make sure to deploy the client and the server together. This makes sure that the hashes stay in sync and the server can respond to each of them.

Some words about Testing

There is nothing really special when it comes to testing this new service. But while creating this example, I also wrote some tests along the way. You can check them out in this repository. The trick is to avoid calling require.context() within your code under test as jest will not get along with this webpack functionality.

Christoph Walter

Hi, I am Christoph. My ideas (and problems) usually start somewhere in the frontend. Then I get the UX view on board while building quality software that works on an as easy way as possible. That’s why I also value good tests. From time to time I make sure to keep on track in the evolving world of backend services and technologies.

Comment

Your email address will not be published. Required fields are marked *