Intermediate Value Linearizability: A Quantitative Correctness Criterion

Big data processing systems often employ batched updates and data sketches to estimate certain properties of large data. For example, a CountMin sketch approximates the frequencies at which elements occur in a data stream, and a batched counter counts events in batches. This paper focuses on the correctness of concurrent implementations of such objects. Specifically, we consider quantitative objects, whose return values are from a totally ordered domain, with an emphasis on $(e,d)$-bounded objects that estimate a quantity with an error of at most $e$ with probability at least $1 - d$. The de facto correctness criterion for concurrent objects is linearizability. Under linearizability, when a read overlaps an update, it must return the object's value either before the update or after it. Consider, for example, a single batched increment operation that counts three new events, bumping a batched counter's value from $7$ to $10$. In a linearizable implementation of the counter, an overlapping read must return one of these. We observe, however, that in typical use cases, any intermediate value would also be acceptable. To capture this degree of freedom, we propose Intermediate Value Linearizability (IVL), a new correctness criterion that relaxes linearizability to allow returning intermediate values, for instance $8$ in the example above. Roughly speaking, IVL allows reads to return any value that is bounded between two return values that are legal under linearizability. A key feature of IVL is that concurrent IVL implementations of $(e,d)$-bounded objects remain $(e,d)$-bounded. To illustrate the power of this result, we give a straightforward and efficient concurrent implementation of an $(e, d)$-bounded CountMin sketch, which is IVL (albeit not linearizable). Finally, we show that IVL allows for inherently cheaper implementations than linearizable ones.

[1]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[2]  Vladimir Braverman,et al.  One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon , 2016, SIGCOMM.

[3]  Alexander Hall,et al.  HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm , 2013, EDBT '13.

[4]  Marina Papatriantafilou,et al.  Delegation sketch: a parallel design with support for fast and accurate concurrent operations , 2020, EuroSys.

[5]  Nihar R. Mahapatra,et al.  The processor-memory bottleneck: problems and solutions , 1999, CROS.

[6]  Wojciech M. Golab,et al.  Linearizable implementations do not suffice for randomized distributed computation , 2011, STOC '11.

[7]  Philippe Flajolet,et al.  Probabilistic counting , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[8]  Srikanta Tirthapura,et al.  Estimating simple functions on the union of data streams , 2001, SPAA '01.

[9]  Gil Neiger,et al.  Set-linearizability , 1994, PODC '94.

[10]  Amos Israeli,et al.  The Time Complexity of Updating Snapshot Memories , 1994, Inf. Process. Lett..

[11]  Dan Alistarh,et al.  Distributionally Linearizable Data Structures , 2018, SPAA.

[12]  Michel Raynal,et al.  Unifying Concurrent Objects and Distributed Tasks , 2018, J. ACM.

[13]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[14]  Piotr Indyk,et al.  Comparing Data Streams Using Hamming Norms (How to Zero In) , 2002, VLDB.

[15]  Philippe Flajolet,et al.  Approximate counting: A detailed analysis , 1985, BIT.

[16]  Idit Keidar,et al.  Fast Concurrent Data Sketches , 2019, PODC.

[17]  Robert H. Morris,et al.  Counting large numbers of events in small registers , 1978, CACM.

[18]  Leslie Lamport,et al.  On interprocess communication , 1986, Distributed Computing.

[19]  Jaap-Henk Hoepman,et al.  Binary Snapshots , 1993, WDAG.

[20]  Philipp Woelfel,et al.  Strongly Linearizable Implementations of Snapshots and Other Types , 2019, PODC.

[21]  Graham Cormode,et al.  Algorithms for distributed functional monitoring , 2008, SODA '08.

[22]  Leslie Lamport,et al.  Interprocess Communication , 2020, Practical System Programming with C.

[23]  Jacek Cichon,et al.  Approximate Counters for Flash Memory , 2011, 2011 IEEE 17th International Conference on Embedded and Real-Time Computing Systems and Applications.

[24]  Faith Ellen,et al.  The complexity of updating multi-writer snapshot objects , 2006, PODC '07.

[25]  Ana Sokolova,et al.  Quantitative relaxation of concurrent data structures , 2013, POPL.

[26]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.