Restricted memory-friendly lock-free bounded queues

Multi-producer multi-consumer FIFO queue is one of the fundamental concurrent data structures used in software systems. A lot of progress has been done on designing concurrent bounded and unbounded queues [1--10]. As previous works show, it is extremely hard to come up with an efficient algorithm. There are two orthogonal ways to improve the performance of fair concurrent queues: reducing the number of compare-and-swap (CAS) calls, and making queues more memory-friendly by reducing the number of allocations. The most up-to-date efficient algorithms choose the first path and use more scalable fetch-and-add (FAA) instead of CAS [3, 4, 10]. For the second path, the standard way to design memory-friendly versions is to implement queues on top of arrays [2--4, 10]. For unbounded queues it is reasonable to allocate memory in chunks, constructing a linked queue on them; this approach significantly improves the performance. The bounded queues are more memory-friendly by design: they are represented as a fixed-sized array of elements even in theory. However, most of the bounded queue implementations still have issues with memory allocations --- typically, they either use descriptors [5, 8] or store some additional meta-information along with the elements [1, 6, 7, 9].