Coming up with a solution

It took a while for me to write this post, but I came to a solution a few days after writing my latest post.

Thinking to migrate to Firebase, I started to dig more under the hood of how Firebase was working. I liked the idea to have data retrieved very fast but was curious how it could send a lot of data under 200ms.

Turns out it doesn't. What it does is keeping a connection alive and receiving what the queue has to offer to that current connection. If data comes between two requests, it is queued and sent once the new request is ready.

I tried to implement something similar, and the result was crazy. I could send a "retrieve" request (GET in a RESTful way) that would respond immediately with an ACCEPTED status code, and once the requested data was ready, it would be published on that long-polling request or pending until that request would be open.

Of course, I had to rewrite quite some code, but it reduced the code I had implemented and made me write something that was clearer. Win-win !

After a few days of working on it, I discovered something problematic. You are limited on how many long-polling XHR request you can have in a browser for a given domain. This means that any user wanting to open many tabs will quickly have issues.

That's when I started to dig more into websockets. I was already aware of this while I was researching long-polling, but I had the misconception that it was not as widely available as it is.

It turns out that websocket is ready and working very well on all browsers. Moreover, I'm using Sanic on the backend, which has a very good implementation of websockets. Migrating the long-polling request to websocket was very easy, and the result was impressive.

So, how does it work ?

It's quite simple in the end.

I always try to have my server-side structure following the REST pattern, the best I can. So I did the same, but adding a realtime layer on top of that.

Concretely, there is only one command that requires data ; It's GET. The other commonly used commands (PATCH, PUT, POST, DELETE) are actions that needs validation and acknowledgement, so you can't fire-and-forget these.

I've added a custom options on my GET functions that, when enabled, will respond immediately with an ACCEPTED status code, and then continue to work by executing the requested work (listing the users, for instance).

Instead of returning them to the client, the results, fetched one by one for faster processing, are sent to a queue along with a "request identifier".

On the consumer end of the queue, a websocket event is sent to the user with the request ID and the data available. Once all the data has been sent, a "done" event with the request ID is sent to indicate the client the data has been sent.

The functions handling these GET requests and sending the data to the queue is build in a hybrid way : It will send the resulting data one by one to the queue for the websocket, but only if a specific header is present to behave that way (simply, the request-id set by the browser, which will be able to match the responses received by the websocket). If that header is missing, the data is returned as a standard API response in JSON.

Any POST/PUT/PATCH/DELETE requests are synchronous and always respond once the request work has been done, with the updated information (or removed in case of a DELETE). But in parallel, an event is fired to all the websocket connections to notify of any changes.

That way, when a user updates some data, all the other connected users have their data updated in realtime.

Of course, a security layer is added and only the users belonging to the same organization have these updates. This means that an organization won't have access to data from another organization or be notified from their changes.

This solution is close to what Firebase is offering, except that the logic is done on the server side (filtering, ordering, etc) rather than on the client side. But that's just a decision to lower the amount of data the user will handle. Every data received while connected are stored in the memory of the browser and the code listens to any changes (update/delete) and updates them appropriately.

When the user closes the tab and re-open it later, the first connections loads the essential data again and will receive any subsequent changes and apply them.

This is done simply because it would be too complicated to track the database state right before the user closed the tab, and send only the list of changes when they come back. The data sent when re-opening the service is not that much and with current Internet bandwidth, it only takes a few ms to fully load. Moreover, the data loaded is done in the background while the application shows key elements, so when the user access these data, they are already present.

Conclusion

I hope this was interesting. The whole quest of finding an efficient, fast and always-up-to-date interface was quite challenging and made me rethink how we load and access data.

This required me to re-think and re-write how the backend behave, and how the frontend could rely on the browser to speed up the process.

A future iteration of this would be to rely even more on the browser features to store data (IndexedDB for instance) in order to fasten the rendering of the application, but reload everything from the server and update what is needed.

Taking into consideration slow browsers and slow internet connections will be something else to dig into, which gives this project a nice and complex technical thought.