Software Engineering
queue producer-consumer
Updated Sun, 25 Sep 2022 16:15:50 GMT

What kind of queue should I use for processing large volume of data?

I have a segment of customers which may range between 1 and 5000. For each customers, I need to do some processing by making some query to the database using the data from the customer. The query is expensive and is the reason why I plan to use a queue to handle these tasks.

I figured a simple queue with one producer and a consumer won't be able to scale. The server is written in Node JS.

Say I get a list of 500 customers and assume it's an array of unique ids [10, 20, 30, ...500]

Now I loop through the array and pass it in the queue like this:

customers.forEach((id) => {
    customerId: id

Now for each customer, I need to run an expensive query when the customer id is consumed.

Here is my question: By running a query for each customer, isn't that the same as 500 customers performing queries at the same time even without a queue. So, did the queue really offload the tasks?

What type of design do I need to implement in order to solve this issue?


Queues do not offload tasks. Queues just... queue up tasks. Queue can be useful when you have one part of the system producing a burst of tasks or data and then another part can process the tasks/data more slowly.

If the task is idempotent (running it more than once doesn't help), a queue can also help to reduce redundant executions: if the task is already in the queue then you don't add it again.

But if you actually need to do a query more times in a certain amount of time than a server can actually do, then you need a faster server (or group of servers) or a faster query. Queuing won't help you.

Comments (1)

  • +0 – Thank you for the explanation. — Jul 26, 2022 at 16:15