Featured image of post You don't need GraphQL to query less data

You don't need GraphQL to query less data

A simple way to query less data from the server

You don’t need GraphQL to query less data

Introduction

Imagine you’ve been working on the application for a long time. You’ve created many REST endpoints which are working fine, but you begin to notice that your queries are slowing down. You suspect it’s because you’re retrieving too much data from the server. You’ve heard about GraphQL and its way to query less data, but changing whole application structure to GraphQL is not best idea at that time, so you decide to remove redundant fields from the response, or create different endpoints for different client application view, yeah, it works, but it takes additional time to do that, it increases the number of bugs as you can forget to add some fields to some endpoints and so on. But is it really the only way? Let’s find out.

The problem

Let’s imagine that we have a simple REST endpoint which returns a list of users:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[
  {
    "id": 1,
    "name": "John",
    "age": 20,
    "email": "john@mail.com",
    ...
  },
  {
    "id": 2,
    "name": "Jane",
    "age": 25,
    "email": "jane@mail.com",
    ...
  },
  {
    "id": 3,
    "name": "Bob",
    "age": 30,
    "email": "bob@mail.com",
    ...
  }
]

And we have a client application which uses this endpoint to display a list of users. The problem is that we don’t need all the fields from the response, we need only id, name, and email. So we have two options:

  1. Remove redundant fields from the response.
  2. Create a new endpoint which will return only the fields we need.

First way is the easiest one, but there is one big disadvantage:

  • At some point of the application you may need to grab age field was well as other fields.

Second way is a little harder, but it’s more flexible, but it also has a disadvantage:

  • You need to create a new endpoint for every view of the client application.
  • Once you add a new field to the user you need to add that to all endpoints where this field should be used.
  • Usually the team who works on the client application is different from the team who works on the server, so you need to communicate with them to add new fields to the endpoints.

Is there another way to resolve this issue without these disadvantages? Certainly, let’s look into it.

The solution

The solution is to use a query parameter which allows us to specify which fields we want to get from the server. Let’s call it fields, so now the request will look like:

  1. For one view:
1
GET /users?fields=id,name,email
  1. For another view:
1
GET /users?fields=id,name,age

Let’s define some rules and convention for that.

The rules

Here is the list of rules which, I believe, should be used for this approach:

  1. If the fields parameter is not specified, return all fields.
  2. If the fields parameter is specified, return only the fields which are specified in the parameter.
  3. If the fields parameter is specified and the field is not found, just ignore that (do not throw an error or return value for that).
  4. If the fields parameter consists only of fields which are not found, return an empty object.
  5. If the fields parameter includes nested fields which are not found, but the parent field is found, return the parent field with an empty object as a value.
  6. If the fields parameter includes nested fields but the actual value is not an object or array, do not return anything for that field (see example #5).

The convention

The last thing we need to think about is the convention for the fields parameter. I think that the best way is to have the fields parameter a string with comma-separated fields.

Before we start, let’s define the object we will have examples with:

1
GET /users?fields=id,name,email

Nested fields should be represented in the same way, but inside the parentheses, for example:

1
GET /users?fields=id,name,email,country(name)

The same logic applied to the arrays, for example:

1
GET /users?fields=id,name,email,phones(number)

Real examples

Let’s define the object we will have examples with:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
[
  {
    "id": 1,
    "firstName": "John",
    "lastName": "Doe",
    "username": "jdoe",
    "email": "john.doe@mail.com",
    "address": {
      "street": "123 Main St.",
      "city": "New York",
      "state": "NY",
      "zip": 10001
    },
    "dateOfBirth": "1980-01-02T00:00:00.000Z",
    "registered": true,
    "country": {
      "name": "United States",
      "code": "US",
      "information": {
        "capital": "Washington, D.C.",
        "population": 320000000
      }
    },
    "emergencyContacts": [
      {
        "firstName": "Jane",
        "lastName": "Doe",
        "phone": "212-555-1234"
      },
      {
        "firstName": "John",
        "lastName": "Smith",
        "phone": "212-555-1234"
      },
      {
        "firstName": "James",
        "lastName": "Johnson",
        "phone": "212-555-1234"
      }
    ],
    "roles": [
      "user",
      "admin"
    ]
  }
]

Example 1

Pick root fields:

1
GET /users?fields=id,firstName,lastName

Response:

1
2
3
4
5
6
7
[
  {
    "id": 1,
    "firstName": "John",
    "lastName": "Doe"
  }
]

Example 2

Pick nested fields:

1
GET /users?fields=id,country(name)

Response:

1
2
3
4
5
6
7
8
[
  {
    "id": 1,
    "country": {
      "name": "United States"
    }
  }
]

Example 3

Pick primitive fields from the array:

1
GET /users?fields=id,roles

Response:

1
2
3
4
5
6
7
8
9
[
  {
    "id": 1,
    "roles": [
      "user",
      "admin"
    ]
  }
]

Example 4

Pick nested fields from the array:

1
GET /users?fields=id,emergencyContacts(firstName,lastName)

Response:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
[
  {
    "id": 1,
    "emergencyContacts": [
      {
        "firstName": "Jane",
        "lastName": "Doe"
      },
      {
        "firstName": "John",
        "lastName": "Smith"
      },
      {
        "firstName": "James",
        "lastName": "Johnson"
      }
    ]
  }
]

Example 5

Pick nested fields from the primitive field:

1
GET /users?fields=id(value), roles

Response:

1
2
3
4
5
6
7
8
9
[
  {
    // id is not present in the response
    "roles": [
      "user",
      "admin"
    ]
  }
]

Extending the pattern

The rules for the pattern described above covers 99.9% of all cases, but there can be cases where you need to extend the pattern. Here is the list of possible extensions:

  1. Regular expressions. For example, you want to get all fields which start with name or something like that. It’s not the best practice as it can be time-consuming as regular expressions are not the fastest thing in the world, but it can be useful in some cases.
  2. Getting only the first (or last) N elements from the array. For example, you want to get only the first 5 elements from the array. It can be useful when you have a huge array and you want to get only the first elements to reduce the response size.
  3. Get the defined list of fields of all objects inside some field (kind of instruments(*(id, name))).
  4. Filtering by some condition (like order(*[ > 5])), but it’s rare case as it’s better to add that filter option right to the API.

Useful Libraries

Node.js

The library json-mask is a good choice for Node.js applications, it fully covers all the rules and conventions described above (except extended cases).

The library express-partial-response uses package json-mask under the hood and has prepared middleware for Express.js.

Python

The library jsonmask is a good choice for Python applications, it fully covers all the rules and conventions described above (except extended cases).

The library django-rest-framework-queryfields has similar functionality, but it is designed for Django REST Framework.