//

GraphQL business error responses revisited

7.8.2022 | 10 minutes of reading time

GraphQL excels by suggesting to think more from the actual client requirements than from the sometimes prematurely abstract server-side grand conceptualizations. It's essential to strike the right balance between the perspectives of all stakeholders, foremost the clients, to evolve truly superb APIs; and APIs need to be truly superb to remain usable and extensible over time, as complexity inevitably will grow. While the GraphQL Schema allows clients to see what's available, clients have to specify what they need in actuality, i.e. which fields in particular, allowing the server to measure the success of every single field; and eventually remove fields that are not needed any more. This is crucial to know, especially when migrating APIs. This way to think about APIs should also hold true when a service needs to report business errors. Let's explore the options GraphQL gives us.

Technical Errors

In GraphQL, a response can contain not only the data, but also a list of errors to report technical problems. For example, when a request is syntactically invalid, the service should be as assistive as possible, so the client can easily understand what they can do to fix it, e.g. by responding:

Listing 1. Validation Error
1{
2  "errors": [
3    {
4      "message": "The type 'Order' doesn't have a field 'foo'. Please check the Schema.",
5      "locations": [{"line": 4, "column": 5}],
6      "extensions": {
7        "description": "Field 'foo' in type 'Order' is undefined",
8        "queryPath": ["order", "foo"]
9      }
10    }
11  ],
12  "data": null
13}

If, on the other hand, there is a problem within a service, it should indicate to the client that either a) a retry is the appropriate reaction to a temporary glitch:

Listing 2. Availability Error
1{
2  "errors": [
3    {
4      "message": "We have a scheduled downtime. Please retry your request later.",
5      "extensions": {
6        "retryAfter": "2023-05-11T17:53:13.523Z"
7      }
8    }
9  ],
10  "data": null
11}

Or b), there's obviously a bug in the service, so it's nice to inform the clients if and what they can do to have it fixed.

Listing 3. Internal Bug
1{
2  "errors": [
3    {
4      "message": "Sorry, we have a bug in retrieving the Customer. If you need to see our progress in fixing it, please follow the link below.",
5      "extensions": {
6        "queryPath": ["order", "customer"],
7        "bugfixUrl": "https://bugs.example.org/error-instance/0305a282-9941-4714-83a8-08e5a3353a12"
8      }
9    }
10  ],
11  "data": {
12    "order": {
13      "id": "2022/183435",
14      "orderDate": "2023-05-11T17:53:13.523Z",
15      "customer": null
16    }
17  }
18}

Note the queryPath extension, which allows us to link an error to a specific field within the requested data. GraphQL allows for partial results, i.e. if only part of the request can't be fulfilled, e.g. customer, the data is filled with what can be retrieved, while the errors explain the missing parts.

These examples are technical errors: automated handling (other than retrying) is not reasonable. They can happen anytime and everywhere, so they don't have to be part of the Schema.

But there's also a different class of reasons for some data to be unavailable. Maybe an order is being blocked and hidden because it was considered to be fraudulent. Let's call such situations business errors. They are not exceptions, as they are part of the normal operations of the system; nobody should worry; everything's fine; no stacktraces in the logs, please.

If we report such errors via the same errors mechanism, the clients will ask us for the list of all possible errors and if we can introduce for example some code extension, so they can properly react to them instead of parsing the human-readable message, which could change at any time. But this would be an out-of-band communication: clients would have to consult a second source beside the Schema, e.g. a wiki; and wikis, as the saying goes, are where information goes to die: they are bound to be outdated. We'd rather have business errors be directly part of the Schema.

Result Union

There is an excellent talk by Sasha Solomon, that inspired me to rethink the whole topic and write this blog post. She shows exactly why we need business errors to be result types made visible in the Schema, and provides an ingenious way to do so: the valid response value and all possible business errors are wrapped into a Union result:

Listing 4. OrderResult Union
1type Query {
2    order(id: String): OrderResult
3}
4union OrderResult = Order | OrderNotFound | OrderLocked
5type Order {
6    id: ID
7    orderDate: Date
8}
9type OrderNotFound ① {
10    positive: Boolean
11}
12type OrderLocked {
13    start: DateTime
14    reason: LockReason
15}
16enum LockReason {
17    FRAUD
18    MANUAL
19}
The positive field is just a dummy value that is always true. I define it here because I want all errors to be Types and not Scalars, so I can add real fields later, and a Type must have at least one field.

A query could be:

Listing 5. OrderResult Query
1query order($id: String) {
2    order(id: $id) {
3        __typename
4        ... on Order {
5            id
6            orderDate
7        }
8        ... on OrderLocked {
9            reason
10            start
11        }
12    }
13}

The client needs to make sure to always add code that handles unknown __typename values, so new errors can be added to the Union without breaking anything. When we add an OrderCancelled response, the client can see that there's a new value for the __typename, which they probably have to add special handling for.

(Side note, just for the sake of completeness: a very similar solution can be achieved with GraphQL Interfaces; but I won't go into details, as it doesn't add a lot to the argument.)

For nested errors, the result Types can be nested, too:

Listing 6. Nested CustomerResult Union
1type Order {
2    id: ID
3    orderDate: Date
4    customer: CustomerResult
5}
6union CustomerResult = Customer | CustomerBlocked
7type CustomerBlocked {
8    positive: Boolean
9}

One Way Street

The Union based solution is great: the Schema clearly documents what possibly can go wrong, business-wise, and what alternative data is available. But it doesn't mitigate one major problem: communication is one-way, server to client. The backend still doesn't see which clients handle which errors, as clients can react to any of the possible __typename values without selecting even a single detail field, e.g. the OrderNotFound. One could say that the __typename is just the new code extension made part of the Schema, which is a major step, but the backend can't see that some error is not needed any more and can be removed. Or that a specific client has started to handle a specific business error. The concept, as clean as it is, is somewhat impure around the edges. It regulates the communication from the server to the client, but doesn't improve the communication of the requirements from the client to the server, which is a central benefit of GraphQL.

Result Fields

In order to make the clients' requirements visible, one option is to declare the OrderResult from above to be a regular wrapper Type instead of a Union, i.e. allow clients to select what business errors they want to handle:

Listing 7. OrderResult Wrapper
1type OrderResult {
2    order: Order
3    error_orderLocked: OrderLocked
4    error_orderNotFound: OrderNotFound
5}

Normally, exactly one of these fields is set, which is, sadly, not visible in the Schema; it must be documented as a convention.

If an error situation arises that a client didn't expect, e.g. they don't know that orders can be locked, they don't select the error field. In order to prevent the situation that the order field is null and the client has no idea why, we fall back to reporting the error via the classic, technical errors field. This allows us to add new business errors at any time, e.g. when we add the feature of cancelling orders, we add an error_orderCancelled; and existing clients get it as technical errors.

If it's still possible to retrieve the order data and the new error is not critical, i.e. existing clients can safely ignore the fact that an order is, e.g., locked while reading, we can simply not report the error.

In both cases, if clients want to react to a new error, they can simply select the corresponding new error field. And on the other hand, if we see that no client selects an error field anymore, we can safely remove it.

Nesting works exactly like with the Union responses described above.

Breaking Changes

But introducing such a result wrapper Type (also if it's a Union) is a breaking change. We can't do that as long as clients still expect us to return the data directly. We could use the wrappers right from the beginning, every time, even before we know if we might need to add business errors later. But even a scalar field could need to be in a business error state: e.g. just the customer name could be blocked, i.e. we'd have to wrap it in a CustomerNameResult. Doing so for every field doesn't make sense.

So we need a strategy to migrate existing queries, define a new field with a new name returning the new wrapper type, while the old field still returns the flat data. If the old field is not selected anymore, we can remove it. To have a standard naming convention for the wrapped fields, we could suffix the name with a word, e.g. append Result to the old order field to create orderResult. But this can easily be confusing, e.g. a mutation createGame would become createGameResult, which sounds like something very different. So I'd opt for a single underscore, e.g. order_; but that's not explaining itself. And would we do that also for fields where we do know from the beginning that they have some error states? It's not easy to make such an API consistent.

Flat

To prevent breaking changes by design, we could add the error fields directly to the return Type:

Listing 8. Order With Error Fields
1type Query {
2    order(id: String): Order
3}
4type Order {
5    id: ID
6    orderDate: Date
7    # ...
8    error_orderLocked: OrderLocked
9    error_orderNotFound: OrderNotFound
10}

Again, if some error occurs that the client didn't select, we fall back to reporting it like a technical error.

Nesting works naturally, i.e. the Customer type could have an error_customerBlocked field.

At first glance, this solution looks perfect, but it has a major drawback: all data fields must be nullable.

Extra Queries

GraphQL allows us to issue several queries at once; so we can add separate queries for the business errors. In order to link them to the original query, we use a naming convention <query-path>error<code>:

Listing 9. Order and extra error fields
1type Query {
2    order(id: String): Order
3    order_error_orderLocked: OrderLocked
4    order_error_orderNotFound: OrderNotFound
5    order_customer_error_customerBlocked: CustomerBlocked
6}

An actual query could be:

Listing 10. Order and extra error field query
1query order($id: String) {
2    order(id: $id) {
3        id
4        orderDate
5        customer {
6            name
7        }
8    }
9    order_error_locked {
10        reason
11        start
12    }
13    order_customer_error_customerBlocked {
14        reason
15    }
16}

Mangling the query path into a field name may feel messy, but it works.

Also here, if some error occurs that the client didn't select, we fall back to reporting it like a technical error.

Note that the return type of order must be nullable now.

Conclusion

So these are the five options we have, and all have their merits:

OptionVisible in Schema (server→client)Visible in Query (client→server)Extensible (no breaking changes)Side effects
Technical Errors𐄂𐄂
Result Union𐄂𐄂
Result Fields(✓)𐄂
Flat(✓)All data fields must be nullable
Extra Queries(✓)The data field must be nullable
(✓) means that, while the business errors are visible in the Schema, there is an extra convention on top, that you'll need to know in order to truly understand the Schema.

No option is perfect, so the question remains: which one should I choose? Is it just a matter of taste? The elegance in the Schema when using Unions is compelling, but the drawbacks are severe. I'm especially hesitant to prematurely introduce 'best practices' like result wrapper Types; I've already had too many best practices proven to not actually be best in all cases. My personal conclusion is that using Flat error fields provides the best balance between extensibility and visibility.

What do you think? Let's discuss!

share post

Likes

0

//

More articles in this subject area\n

Discover exciting further topics and let the codecentric world inspire you.

//

Gemeinsam bessere Projekte umsetzen

Wir helfen Deinem Unternehmen

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.