Is GraphQL a great choice for East-West Service Communication?
GraphQL is gaining a lot of momentum and the community is trying to apply it to pretty much every problem on earth 🌎 In this post, we’ll try to see if it’s a good match for one of these problems, service-to-service communication!
Some of you might already be confused by this post’s title. What do cardinal directions have anything to do with service communications? North-South and East-West are commonly used to describe types of communications between… computers. East-West commonly refers to communication between internal services, or within your data center. For example, in a service-oriented-architecture, different services usually need to talk to each other, this is east-west traffic. North-South traffic is commonly used to describe traffic that comes from the internet, from a user. To put it into an example, an API gateway usually receives North-South traffic, but resolves the request often using East-West traffic to talk to fellow services.
We know by now that GraphQL can be a great choice in terms of serving an API to North-South traffic, as an interface to your system. But is it a good idea to use GraphQL to communicate between services? Let’s take a look 👀
Network boundaries are quite useful for a number of reasons. Independent scaling, availability, development speed. All these are good reasons for which some of you have opted for a service oriented architecture. However, when we go from extremely fast method calls to network call, we introduce a lot of overhead. This is why one of the first goals of someone thinking about service-to-service communication is to reduce this overhead as much as possible. For this reason, performance is quite important when choosing a protocol/architecture for communications.
Another thing we have to be careful is to realize remote calls (over the network) are inherently different than local calls (method calls). They usually require a very different design. Thankfully, pretty much any technology choice available to us these days will allow us to design our API in a way that makes remote calls possible, as long as we’re careful with our API design. This also means we need to think about resiliency, and don’t abstract the network too much to avoid making network calls behind the scenes, or making things inefficient without realizing.
API evolution is a big one also. We don’t want an added field to one service’s API to require changes to all clients in our system. Expand-only types, are very useful. They help us avoid lock-step deploys. Thankfully most solutions also offer that currently including GraphQL, so this does not point us towards anything just yet.
To be fair, when speaking about protocols, lot’s of people will jump at the fact most GraphQL APIs are implemented over HTTP 1.1, and while that’s probably true, it doesn’t have to be this way. GraphQL is purposefully transport / protocol agnostic. Nothing is stopping us from implementing GraphQL over H2, over UDP, or even as a new GraphQL specific protocol.
However the counter side of this is that hasn’t been really tried before, and chances are some semantics we’re used to (200 Status Code, headers) with current GraphQL APIs would need to change or at least be adapted.
In general I’d like to think that protocol or transport layer is mostly a non-issue if we had strong enough incentives to use GraphQL in Service-To-Service communications.
Optimization & Customization
A lot of very important things with service communication are achievable whether we use GraphQL or something else. If we want to use GraphQL as our service communication layer, I think we need to ask ourselves why we would want to do so versus another RPC solution. What does GraphQL really bring to the table that’s different from a REST, Thrift or gRPC API? A big one is the query language, client side controlled, which brings customizability, and dynamic execution to the table. I tweeted about this a while ago, and decided made some kind of diagram to put it in a clearer way:
This is mostly in relation to REST and GraphQL, but we can fit RPC in there too. Most RPC calls have to be quite optimized for their purpose. They would probably end up towards the far left of this spectrum.
You have to think about what kind of use cases your services are answering. From what I’ve seen, services usually answer to a set of very clearly defined use cases. A lot of services answer to server-to-server calls, although some services will expose an API to a web UI or mobile app. Do we really need that much customizability for a service’s well defined use case? I’m not so sure. In a world where performance and reducing overhead is so important, more so than offering customizability and multiple client support, I would tend to use something else than GraphQL unless we’re truly feeling the cost of supporting a lot of different use cases with a single service.
I think where GraphQL can become useful is East-West scenarios is an aggregator or orchestrator. Whether too much orchestration is good is a different debate, but if we’re talking about synchronous communications, I believe GraphQL can be quite a good aggregator for UI, mobile, public and internal API consumers. (I’ve talked about GraphQL gateways a bit before)
The GraphQL Aggregator can take the big responsibility of providing customizability while our internal services focus on being very performant and optimized. The great thing is that we can handle some resiliency at that level too. One service is down? The aggregator can provide partial responses or even placeholder data. We could totally write an aggregator or orchestrator service using REST, gRPC, etc, but in these cases we can actually make use of GraphQL’s declarative and dynamic nature.
Another great use of an aggregator or gateway is in the context of an event driven architecture. The folks at Hasura have called a similar pattern the 3factor architecture.
I think GraphQL can be a great choice in some specific situations. Where I’m uncertain is whether standardizing on GraphQL for service-to-service communication is a good idea. Judging by the needs of typical internal service communications, it doesn’t seem to me that GraphQL was meant to shine in these situations. The features we love about GraphQL come with some tradeoffs that I’m not sure we want to make in that context. Overall I’m really eager to try using GraphQL as an aggregator more than the standard way services communicate.
Have you used GraphQL as a service-to-service communication tool? How’s that going for you? Thanks for reading 💚