Error handling in GraphQL can be surprisingly difficult. With the ability to tie together what would otherwise be many requests, we see both its greatest strength but also complexity when things go wrong. What does it mean when part of the query was successful but another part failed? In addition there does not exist a standardized set of codes for us to lean on—instead, we need to look to emerging leaders such as the Apollo framework for inspiration. Fortunately, the specification does give us the tools we need to provide rich error handling that can shed some light on this problem.

Why GraphQL at Atomist?

GraphQL is a wonderfully expressive medium. At Atomist, we provide an API that turns your software development into a connected graph of data. An API that allows you to traverse these relationships freely is essential, and GraphQL fits the bill exactly. With this, we can query, for example, a git commit and find out who wrote it, what issues are related, what dependencies this commit introduces or removes and what chat channels we should notify. The ability to explore these relationships in a documented medium and chain together these data into a single query is compelling. By providing extension points, GraphQL even allows us to customize the schema for each customer, tying in data specific to that team's needs.

{
  Push {
    commits {
      sha
      message
      author {
        name
      }
      image {
        imageName
        pods {
          state
        }
      }
    }
    repo {
      channels {
        name
      }
    }
  }
}

What types of error are there?

We need to deal with a number of different causes of error, but they can be broadly split into two categories: those that apply to the whole request and those that only apply to a subset of it. For example, authentication applies to the whole request—if your credentials are invalid, then nothing can be returned. However, authorization might apply to a subset of the request—you might have claims sufficient to view some parts of the results but not others. This is important as in the first case, we can return no data and a general error code, but in the second case, we must return the successful data along with an error that explains which parts you were not authorized to access.

Whilst the schema in GraphQL saves us from many user input errors; they are still very possible. We might define that the input to a mutation is of type Int in the schema, but the service dealing with those data might further constrain it to be a positive number. Errors must be returned if this is violated. Indeed its also possible to send data that does not conform to the schema at all, although typically, this would be a concern of your GraphQL library.

Unexpected errors in the backend will also occur. Sometimes these could have the effect of the whole request failing with an HTTP error—network-level failures, for example. Other times they may be localized such that the bulk of the response returns successfully, but some component fails. These partial failures lead to a responsibility on the consumer to always check for errors in the response, even when data are returned.

What should errors look like

Firstly we must understand what the GraphQL spec says about errors. Let's look at some examples.

{
  "errors": [
    {
      "message": "Authentication failure"
    }
  ]
}

The above shows the simplest allowable error response from a GraphQL API. Here we have an error array with a single entry. Each entry must have a message key that contains a human-readable string describing the error. This would be a suitable response for something like an authentication error where no response can be served, and the error affects the entire request

{
  "errors": [
    {
      "message": "Not authorized to access this field",
      "locations": [ { "line": 3, "column": 10 } ],
      "path": [ "User", "secrets" ],
      "extensions": {
        "code": "UNAUTHORIZED"
      }
    },  
  ],
  "data": "maybe..."
}

Here we can see an error response more suitable to a partial failure, data may or may not be present in this response. The locations field is used to point to the line and column in your request that the error is related to. In cases where no response can be provided, such as a validation failure, this is all you will get. However, if this is a partial error and a response is returned, we should also return a path attribute that corresponds to the point in the response where we have been unable to fulfill the request—typically, this will be a null field. As such, this is useful in determining whether or not null is legitimate data or as a result of a failure.

Finally, we can see that this example contains an extensions entry. The extensions field is where the GraphQL spec allows you to place your own keys. Here we can see the approach taken by Apollo where we see the introduction of a code field. In terms of handling GraphQL errors, this field is essential. It allows a consumer to branch and react programmatically to errors. These error codes can be used to form the basis of the contract between client and server when handling known errors. HTTP response codes can often serve as good inspiration for some categories of code although we can be much more specific here if required.

Also of note here is that the errors array appears before the data. This is a recommendation from the GraphQL spec and a good one if you can implement it. When reading a response, it is very easy to miss that there are errors when the data is large.

Making the best of null

From time to time, unexpected null fields are an inevitability. Either due to a field not having been written or failure fetching data from a downstream service that makes up your GraphQL API. By making good use of non-nullable fields, denoted with an exclamation mark, we can handle these cases most appropriately and save work for our consumers.

When a null field is encountered on a field marked with non-null, the null value is bubbled up until it hits a point in the schema where it is allowed to return null.

type Push {
  commits: [Commit]
}
type Commit {
  sha: String!
}

Dealing with errors as a consumer

As a client consuming a GraphQL API, there is a responsibility to respond and react any time an errors entry is included in the response. In some ways, this can seem more complex than when dealing with traditional APIs. However, it is often better to think of a single GraphQL request as replacing many potential calls. We should compare the error handling to what would have been had we been forced to make all those requests. From this perspective, it doesn't seem too onerous.

If a request contains both errors and data, the consumer must decide what it can do. Ideally, it will still be able to do something useful with the response and will be able to decide on how best to deal with the missing data and errors. As a fallback, it may be that the client has to treat the whole request as failed and either retry or display an error—indeed, this was the default behavior of the popular Apollo client in version 1.

The future

Good error handling on both the client and server is essential when using GraphQL. The GraphQL spec does give us the tools required to express and handle errors appropriately. That said, the required use of error codes does seem to lead us to an implicit rather than an explicit contract being formed. This seems somewhat contrary to the goals of highly specified GraphQL. There are some interesting ideas around explicitly specifying errors using union types—we could explicitly state that a non-scalar field either returns itself or returns the error type. It does seem though that more explicit error handling could be added to the specification to solidify the contract whilst keeping the schema clean.