Implementing Multithreading in SwapCacheDb

Hey all, in this post I will be going through the design I implemented to reduce blocking behavior in SwapCacheDb queries by introducing multithreading using Quirk.

Introduction

If you’ve taken a look at my My Journey Building a NoSql Database in Typescript post your probably aware of some blocking behavior in SwapCacheDb that lead to some serious performance impacts while making queries to the db. If not, basically queries on the db were taking a long time to complete, and it was blocking the nodejs event loop so that subsequent requests were completely blocked until it completed. This resulted in obseen increases in get / put and basically any other operation latency. We’re talking 800+ ms get calls when normally the average is 3ms with a p99.9 of ~9 ms. So obviously this is unnacceptable, so I got to work fixing it.

Node.js is “Single Threaded”

By default node.js is though of as “single threaded”. I put this in quotes because it is a little more nuanced than that, there is actually an event loop thread that picks up tasks and runs them, but io related work can occur in the background unblocking the event loop for other requests. This means that certain tasks can execute concurrently.

The problem we were seeing is that queries were running in a blocking nature, reading all records, performing query filtering, and returning the results. Like I mentioned, this blocked all subsequent requests. So how would I fix it?

Multithreading in Typescript

Another reason I put “single threaded” in quotes is that node.js does actually support threads via worker threads. Similar to other languages, you can have a thread worker file with logic that can be performed in a seperate thread. So I immediately went down the route of implementing it. Before going any futher though, let’s go over the current architecture before multithreading.

Existing Architecture

In the diagram above I show how the blocking behavior occurs. Basically the entire SwapCacheDb implementation is in the main thread within the SwapCacheDbWebService. Each request comes through the api, then into SwapCacheDb, and routes to SwapCache that performs read operations on the filesystem. Once results are returned from the file system, the records are passed to swap query for filtering and matches are aggregated and returned to the user. I also show that the subsequent get request comes in immediately after the query request, but is blocked until it can complete after the query completes in step 8.

The Fix

So then how should I go about fixing, this? Well, the nice thing is that SwapCacheDb full encapsulates the db logic, and SwapCacheDbWebService simply delegates to it. Therefore, my thought was to simply have a seperate instance of the db in the thread worker, and simply pass a command to that worker to perform the querying logic. For this, I used quirk which is a wrapper library for thread workers and processes that manages asynchronous communication layer between the main thread and the main thread where the web service is running. This ultimately looks like:

With this new implementation, I have a MultiJobQuirkPoolWorker that performs the long running query operation. When the request comes into SwapCacheDbWebService, it is passed to the QuirkPool skipping the main thread logic from the old blocking implementation. From there, quirk sends the request details via a message to the underlying pool worker (thread worker) where basically the old logic executes. Once complete, the result is passed back again via a message to the main thread. In the main thread, we await the result asynchronously, and once returned it simply returns the data to the user.

This async, multithreaded operation then unblocks user 2’s request which completes just after the query request makes it to the quirk pool. This results in a very slight increase in the get latency of about 2-4ms, far less than the 800+ms latency we were seeing in the original implementation.

Prove that it’s using multiple threads

So some think that multithreading in node.js is fake, so let’s take a look at the system resources being consumed by the multhreaded version!

There’s your proof, all threads executing in parrallel. Compare this to the original usage:

As you can see, only a single thread is being used. Don’t be tricked by the threads going up and down, that is related to context switching the cpu is doing, basically switching which thread is doing work. But consistently you can see that primarily a single thread is taking the most load.

Wrapping up

I hope this post provides some value to others, it was very rewarding to get this fixed and squeeze the maximum amount of performance (for now) out of SwapCacheDb. Parrallel computing can be very difficult, which is why I made quirk. Quirk takes the complexity out of multithreading, and allows you to focus on writing the business logic. If your interested in me going into more detail on quirk in another post, leave a comment, or reach out to me on social media. I’ve learned a lot, and am always excited to share my learnings on whatever topics that interest you, just let me know!

May your code always be optimized!,

J

Leave a Comment