Software Engineering
multithreading synchronization producer-consumer
Updated Fri, 19 Aug 2022 14:03:09 GMT

One producer - multiple consumers queue, best way to guard against rare duplicates?

I have a classical one producer - multiple consumers queue, each item can be processed independently. On rare occasions (<1%) there can be multiple queue items related to the same object the consumer is processing. It is perfectly fine to have that with the only constraint being that the object (potentially referenced by more than one queue item) cannot be processed simultaneously by more than 1 consumer. The second queue item would then have to wait until the first queue item is done processing the common object.

Obviously, I don't want to force the consumer to be completely single threaded (99% of the queue items can be processed independently), but at the same time I don't want to create a critical section or a mutex for each object (100 or so critical sections stored in a dictionary does not look good). Is there a better design pattern?


[As discussed in comments: OP is moving towards an actual queue implementation - a genuine single-producer multiple-consumer pattern.]

Here's one approach based on a queue:

Have a map from underlying item to a queue of tasks to be performed.

Also there is a critical section that will be used to protect two data structures: 1) the task queue (at least removing stuff from it), 2) the map item->pending-task-queue.

All operations described below grab the single critical section as described.

Worker loop start: When pulling a task off the main queue the worker grabs the critical section, gets the task from the queue, and looks in the task and finds the underlying item. From the underlying item he looks in the map. There are two possibilities: it's there, or it is missing.

  1. If it is missing then nobody is working on it. He inserts it in the map where it is the only element of the task queue now associated with this underlying item. He releases the critical section and proceeds to work on the task. When he is done working on it he grabs the critical section again and looks his item up in the map again. Getting the pending task queue from the map he deletes his own task from it. Now, there are two possibilities: The task queue (for that item) is now empty, or it is not.

    1. If it is empty he removes the item from map, releases the crit sect, and goes to find more work to do (goes to "worker loop start").

    2. If it is not empty he's got more work to do on this item. So he gets the next entry of its pending task queue, releases the crit sec, and goes to work as above.

  2. Otherwise, here he has the crit sec and has found a map entry for the underlying item. Since it is in the map that means another worker is working on it. So he enqueues his task (that he got off the main task queue) onto the queue for this underlying item (which is in the map), releases the crit sec, and goes to find more work to do ("worker loop start").

(This approach might not be correct, but what the hell, I just thought of it just now. I'm fairly sure it can be made to work correctly in all concurrent scenarios. You might argue it is complex, you might argue it keeps several data structure operations together under a single critical section which might lead to longer latency than you'd like, you might argue that I didn't supply code. Whatever. Consider this a concept proposal. You can modify it until it works, you can weigh it against any other answer provided here, or you can weigh it against any other algorithm you propose. It's a start!)

Comments (1)

  • +1 – Yes, that should work. Thanks! — Dec 17, 2021 at 02:41