Straggler Identification in Round-Trip Data Streams via Newton's Identities and Invertible Bloom Filters

In this paper, we study the straggler identification problem, in which an algorithm must determine the identities of the remaining members of a set after it has had a large number of insertion and deletion operations performed on it, and now has relatively few remaining members. The goal is to do this in o(n) space, where n is the total number of identities. Straggler identification has applications, for example, in determining the unacknowledged packets in a high-bandwidth multicast data stream. We provide a deterministic solution to the straggler identification problem that uses only O(d log n) bits, based on a novel application of Newton's identities for symmetric polynomials. This solution can identify any subset of d stragglers from a set of n O(log n)-bit identifiers, assuming that there are no false deletions of identities not already in the set. Indeed, we give a lower bound argument that shows that any small-space deterministic solution to the straggler identification problem cannot be guaranteed to handle false deletions. Nevertheless, we provide a simple randomized solution, using O(d log n log (1/∈)) bits that can maintain a multiset and solve the straggler identification problem, tolerating false deletions, where ∈ > 0 is a user-defined parameter bounding the probability of an incorrect response. This randomized solution is based on a new type of Bloom filter, which we call the invertible Bloom filter.

[1]  Erich Kaltofen,et al.  On fast multiplication of polynomials over arbitrary algebras , 1991, Acta Informatica.

[2]  Arnold Schönhage,et al.  Schnelle Multiplikation großer Zahlen , 1971, Computing.

[3]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[4]  Michiel H. M. Smid,et al.  On the false-positive rate of Bloom filters , 2008, Inf. Process. Lett..

[5]  Albert G. Greenberg,et al.  A lower bound on the time needed in the worst case to resolve conflicts deterministically in multiple access channels , 1985, JACM.

[6]  Micha Hofri,et al.  Stack algorithms for collision-detecting channels and their analysis: A limited survey , 1984 .

[7]  Sampath Kannan,et al.  Group testing problems with sequences in experimental molecular biology , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[8]  Annalisa De Bonis,et al.  Generalized Framework for Selectors with Applications in Optimal Group Testing , 2003, ICALP.

[9]  Yaron Minsky,et al.  Set reconciliation with nearly optimal communication complexity , 2003, IEEE Trans. Inf. Theory.

[10]  David A. Cox,et al.  Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra, 3/e (Undergraduate Texts in Mathematics) , 2007 .

[11]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[12]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[13]  Sumit Ganguly,et al.  Deterministic k-set structure , 2006, PODS '06.

[14]  P. Papantoni-Kazakos,et al.  A Collision Resolution Protocol for Random Access Channels with Energy Detectors , 1982, IEEE Trans. Commun..

[15]  David Eppstein,et al.  Improved Combinatorial Group Testing for Real-World Problem Sizes , 2005, WADS.

[16]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[17]  Nicholas Pippenger,et al.  Bounds on the performance of protocols for a multiple-access broadcast channel , 1981, IEEE Trans. Inf. Theory.

[18]  Victor Shoup,et al.  New algorithms for finding irreducible polynomials over finite fields , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[19]  Henri Cohen,et al.  A course in computational algebraic number theory , 1993, Graduate texts in mathematics.

[20]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[21]  D. Du,et al.  Pooling Designs And Nonadaptive Group Testing: Important Tools For Dna Sequencing , 2006 .

[22]  H. Nussbaumer,et al.  Fast polynomial transform algorithms for digital convolution , 1980 .

[23]  George Varghese,et al.  An Improved Construction for Counting Bloom Filters , 2006, ESA.

[24]  John Capetanakis,et al.  Tree algorithms for packet broadcast channels , 1979, IEEE Trans. Inf. Theory.

[25]  Donal O'Shea,et al.  Ideals, varieties, and algorithms - an introduction to computational algebraic geometry and commutative algebra (2. ed.) , 1997, Undergraduate texts in mathematics.

[26]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[27]  D. Du,et al.  Combinatorial Group Testing and Its Applications , 1993 .

[28]  Victor Shoup,et al.  A fast deterministic algorithm for factoring polynomials over finite fields of small characteristic , 1991, ISSAC '91.

[29]  David Eppstein,et al.  Space-Efficient Straggler Identification in Round-Trip Data Streams Via Newton's Identities and Invertible Bloom Filters , 2007, WADS.

[30]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[31]  Douglas R Stinson,et al.  Surveys in Combinatorics, 1999: Applications of Combinatorial Designs to Communications, Cryptography, and Networking , 1999 .

[32]  Peter Vanroose,et al.  A code construktion approaching capacity 1 for random access with multiplicity feedback , 1994 .

[33]  Michael T. Goodrich,et al.  Efficient parallel algorithms for dead sensor diagnosis and multiple access channels , 2006, SPAA '06.