(Fernand is the new helpdesk service we are building at the time of the writing. It's not publicly live yet even though we've already spent 18 months on this. The release is soon!).
When I first developed the frontend part of Fernand, I went the naive way: The API would load each endpoint when needed, by respecting the RESTFul protocol.
That meant loading all the conversations when you'd hit the "/tickets/" page. That meant waiting for all the conversations and contact details to load.
Granted, I've spent a fair amount of time on improving that endpoint, but it still took about a second to render it properly.
But this wasn't the main issue. The main issue is that by doing so, the content (messages, responses, notes, and events) wasn't loaded, because we couldn't know in advance which one you'd open!
So once the loading was done for the listing, we would wait for the user to make a decision before loading what was needed.
When you would load a conversation thread, it would then fetch the API to load the messages, the contact in details (subscription status, latest payments, etc) and also the list of related conversations from that same user.
That loading could take some time - about two seconds - which would result in a bad User eXperience where you would have to wait when switching between tickets.
Not. Great.
That's why I decided to change the way we would load the data. The idea was that we would constantly load the data we expect the user to view before they make any actions. That way, once they click on a conversation, they have all the details needed in a few milliseconds.
The trick in all of that is to take into consideration the limits of what our users might have: We might reach a memory limit if we load too much, we could slow their bandwidth or worse, use all their data allowance!
Here's my discoveries.
First approach: Load the linked data in the same request
My first test was to consider loading all the data related to one conversations in the same request. This forced me to rewrite a bit of the API to return the batch of conversations requested, with extended informations in it (messages, events, notes, contact, related conversations and payments informations)
I implemented a straightforward approach on the API side and I'm pretty sure I could have improved the queries to the database, but I first wanted to see what would be the result.
Loading a batch of 25 conversations per request was taking, on average, 3960ms. Loading 1600 conversations I have on my local machine took about 4 minutes and 4 seconds:
Keep in mind that in order to be useful, there is no need for the user to load the entire 1600 articles! The first 25 batch is enough to get started, and the rest is here in case the user needs it. That means that once the first request is done, the user can start using the app.
... but still, having to wait 4 seconds to get started is too much for Fernand, so we need to improve that.
So here come the second approach.
Second approach: Loading per batch of types
The previous method had a major drawback: We had to wait on the request to finish to start the second one, because of the "cursor" pagination. We couldn't just ask for the n, n+1, n+2 and n+3 items.
Waiting on one long request is not perfect so instead, we rewrote the backend to load the data as flat: Every related instances (such as messages, contacts, etc) would then be loaded by the frontend if/when needed.
Here's how we did it:
When loading the conversation listing, a request asking for the first 25 tickets would be sent. The data would be returned with IDs instead of linked data. Then, the Frontend would push into a "batch" queue every ID needed to be loaded, per type (a queue of IDs for the Contacts, the Messages, the Related conversations, etc). That Batch system would then throw a new network request everytime the queue would have a size of 10 entries - or - when the loader would have finished.
Doing so has tremendous advantages :
- We would add on the Batch queue only the IDs we don't already have loaded or that aren't already on the queue. That means if we have multiple time the same ID (same Contact for instance), we would only load it once.
- By loading only once, we reduce the amount of database query
- We also reduce the size of the payload
So definitely a bigger win.