The tension between data & use-case driven GraphQL APIs

If you’ve been following my posts over the past months and even years, you know how important I think it is to design well crafted GraphQL schemas that express real use cases. When in doubt, designing a GraphQL schema for behaviors instead of for data has been my go-to rule. An older post on “Anemic Mutations” explains my rationale behind that principle.

  • Timeouts: We’re pretty aggressive with GraphQL query timeouts with our API at GitHub, we don’t want to let a gigantic GraphQL running for too long. However, purely data driven clients might need to make pretty large queries (one query only, yay), to achieve their goals. Even though it’s a valid use case and not an abuse scenario, there is quite a high chance queries could timeout if they query hundreds, see thousands of records.

Ship a new data driven schema

One option could be to expose a totally new GraphQL endpoint/schema for more data drive use cases. Get all issues and their comments without pagination. batch loaded types and resolvers. This could possibly be even in the same schema as different fields, but since they’re such different use cases, I can see them being in a completely different schema. The timeout problem still might be hard to solve however, because these kinds of use cases often aren’t simple to consume synchronously. So what if… we could run queries asynchronously instead?

Asynchronous GraphQL Jobs

POST /async_graphql{
allTheThings {
andEvenMore {
Location: /async_graphql/HS3HlKN76EI5es7qSTHNmA
GET /async_graphql/HS3HlKN76EI5es7qSTHNmA202 ACCEPTED
Location: /async_graphql/HS3HlKN76EI5es7qSTHNmA
GET /async_graphql/HS3HlKN76EI5es7qSTHNmA{ "data": { ... } }


Darrel Miller and I met for ☕️ recently and we talked a bit about this problem. One thing he mentioned is that an event stream would actually be great for clients who only care about data. Integrators can then keep things in sync / analyze data however they want. This really resonated with me. If an API client really only cares about raw data and no so much about a business/use-case oriented API, then they might as well connect to some kind of data firehose. The Twitter PowerTrack API is a good example of this. Allowing (privileged/enterprise) clients to consume 100% of Twitter’s tweet data.

Best of both worlds?

Maybe a mix of both is what we’re looking for? Register a GraphQL query to filter a firehose of data events, use subscriptions, and a separate more data-oriented schema:

subscription Comments {
comments(pullRequests: [...]) {
comment {
author {

#GraphQL Enthusiast, Speaker, Platform Interface Engineer @ GitHub 📖 Book is now available