Combinatorial Group Testing and Sparse Recovery Schemes with Near-Optimal Decoding Time

In the long-studied problem of combinatorial group testing, one is asked to detect a set of $k$ defective items out of a population of size $n$, using $m\ll n$ disjunctive measurements. In the non-adaptive setting, the most widely used combinatorial objects are disjunct and list-disjunct matrices, which define incidence matrices of test schemes. Disjunct matrices allow the identification of the exact set of defectives, whereas list disjunct matrices identify a small superset of the defectives. Apart from the combinatorial guarantees, it is often of key interest to equip measurement designs with efficient decoding algorithms. The most efficient decoders should run in sublinear time in $n$, and ideally near-linear in the number of measurements $m$. In this work, we give several constructions with an optimal number of measurements and near-optimal decoding time for the most fundamental group testing tasks, as well as for central tasks in the compressed sensing and heavy hitters literature. For many of those tasks, the previous measurement-optimal constructions needed time either quadratic in the number of measurements or linear in the universe size. Among our results are the following: a construction of disjunct matrices matching the best-known construction in terms of the number of rows $m$, but achieving nearly linear decoding time in $m$; a construction of list disjunct matrices with the optimal $m=O(k\log(n/k)$ number of rows and nearly linear decoding time in $m$; error-tolerant variations of the above constructions; a non-adaptive group testing scheme for the “for-each” model with $m=O(k\log n)$ measurements and $O(m)$ decoding time; a streaming algorithm for the “for-all” version of the heavy hitters problem in the strict turnstile model with near-optimal query time, as well as a “list decoding” variant obtaining also near-optimal update time and $O(k\log(n/k))$ space usage; an $\ell_{2}/\ell_{2}$ weak identification system for compressed sensing with nearly optimal sample complexity and nearly linear decoding time in the sketch length. Most of our results are obtained via a clean and novel approach that avoids list-recoverable codes or related complex techniques that were present in almost every state-of-the-art work on efficiently decodable constructions of such objects.

[1]  Piotr Indyk,et al.  Simple and practical algorithm for sparse Fourier transform , 2012, SODA.

[2]  Weili Wu,et al.  On error-tolerant DNA screening , 2006, Discret. Appl. Math..

[3]  Sidharth Jaggi,et al.  Nearly optimal sparse group testing , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Anoosheh Heidarzadeh,et al.  On Accelerated Testing for COVID-19 Using Group Testing , 2020, ArXiv.

[5]  Arkadii G. D'yachkov,et al.  A survey of superimposed code theory , 1983 .

[6]  V. V. Rykov,et al.  Superimposed distance codes , 1989 .

[7]  Amin Karbasi,et al.  Graph-Constrained Group Testing , 2010, IEEE Transactions on Information Theory.

[8]  Noga Alon,et al.  Optimal Monotone Encodings , 2008, IEEE Transactions on Information Theory.

[9]  Moni Naor,et al.  Deterministic History-Independent Strategies for Storing Information on Write-Once Memories , 2007, Theory Comput..

[10]  Christos Tzamos,et al.  Fast Modular Subset Sum using Linear Sketching , 2018, SODA.

[11]  Salil P. Vadhan,et al.  The unified theory of pseudorandomness , 2010 .

[12]  Ely Porat,et al.  Sublinear time, measurement-optimal, sparse recovery for all , 2012, SODA.

[13]  Jack K. Wolf,et al.  Born again group testing: Multiaccess communications , 1985, IEEE Trans. Inf. Theory.

[14]  Mayank Bakshi,et al.  Efficient Algorithms for Noisy Group Testing , 2017, IEEE Transactions on Information Theory.

[15]  Vasileios Nakos,et al.  Stronger L2/L2 compressed sensing; without iterating , 2019, STOC.

[16]  Mikkel Thorup,et al.  Heavy Hitters via Cluster-Preserving Clustering , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[17]  Mary Wootters,et al.  Unconstraining graph-constrained group testing , 2018, APPROX-RANDOM.

[18]  Atri Rudra,et al.  Efficiently Decodable Error-Correcting List Disjunct Matrices and Applications - (Extended Abstract) , 2011, ICALP.

[19]  David P. Woodruff,et al.  Beating CountSketch for heavy hitters in insertion streams , 2015, STOC.

[20]  Vasileios Nakos,et al.  On Fast Decoding of High-Dimensional Signals from One-Bit Measurements , 2016, ICALP.

[21]  Vasileios Nakos,et al.  Deterministic Heavy Hitters with Sublinear Query Time , 2017, APPROX-RANDOM.

[22]  Ayfer Özgür,et al.  On the Optimality of the Kautz-Singleton Construction in Probabilistic Group Testing , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  David P. Woodruff,et al.  On Low-Risk Heavy Hitters and Sparse Recovery Schemes , 2017, APPROX-RANDOM.

[24]  Yin Tat Lee,et al.  Solving tall dense linear programs in nearly linear time , 2020, STOC.

[25]  Ely Porat,et al.  Approximate sparse recovery: optimizing time and measurements , 2009, STOC '10.

[26]  David P. Woodruff,et al.  Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows , 2018, APPROX-RANDOM.

[27]  Krishna R. Narayanan,et al.  Group Testing using left-and-right-regular sparse-graph codes , 2017, ArXiv.

[28]  Piotr Indyk,et al.  Sparse Recovery with Partial Support Knowledge , 2011, APPROX-RANDOM.

[29]  D. Du,et al.  Combinatorial Group Testing and Its Applications , 1993 .

[30]  Andrei Z. Broder,et al.  A Note on Double Pooling Tests , 2020, ArXiv.

[31]  Piotr Indyk,et al.  (Learned) Frequency Estimation Algorithms under Zipfian Distribution , 2019, ArXiv.

[32]  Piotr Indyk,et al.  Nearly Optimal Deterministic Algorithm for Sparse Walsh-Hadamard Transform , 2015, SODA.

[33]  A. Macula Probabilistic nonadaptive group testing in the presence of errors and DNA library screening , 1999 .

[34]  David P. Woodruff,et al.  Perfect Lp Sampling in a Data Stream , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[35]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[36]  Mikhail Kapralov,et al.  Sparse fourier transform in any constant dimension with nearly-optimal sample complexity in sublinear time , 2016, STOC.

[37]  Venkatesan Guruswami,et al.  Linear-Time List Decoding in Error-Free Settings: (Extended Abstract) , 2004, ICALP.

[38]  Ding-Zhu Du,et al.  Molecular Biology and Pooling Design , 2007 .

[39]  Ely Porat,et al.  k -Mismatch with Don't Cares , 2007, ESA.

[40]  Ding-Zhu Du,et al.  An unexpected meeting of four seemingly unrelated problems: graph testing, DNA complex screening, superimposed codes and secure key distribution , 2007, J. Comb. Optim..

[41]  Mahdi Cheraghchi,et al.  Simple Codes and Sparse Recovery with Fast Decoding , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[42]  Vasileios Nakos,et al.  (Nearly) Sample-Optimal Sparse Fourier Transform in Any Dimension; RIPless and Filterless , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[43]  Holger Rauhut,et al.  A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[44]  Sampath Kannan,et al.  Group testing problems with sequences in experimental molecular biology , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[45]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[46]  David P. Woodruff,et al.  An Optimal Algorithm for ℓ1-Heavy Hitters in Insertion Streams and Related Problems , 2018, ACM Trans. Algorithms.

[47]  Kannan Ramchandran,et al.  SAFFRON: A fast, efficient, and robust framework for group testing based on sparse-graph codes , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[48]  Ding-Zhu Du,et al.  A survey on combinatorial group testing algorithms with applications to DNA Library Screening , 1999, Discrete Mathematical Problems with Medical Applications.

[49]  R. Gregory Taylor,et al.  Modern computer algebra , 2002, SIGA.

[50]  D. Balding,et al.  Efficient pooling designs for library screening. , 1994, Genomics.

[51]  Huacheng Yu,et al.  Faster Update Time for Turnstile Streaming Algorithms , 2019, SODA.

[52]  Ely Porat,et al.  Search Methodologies , 2022 .

[53]  David P. Woodruff,et al.  On the exact space complexity of sketching and streaming small norms , 2010, SODA '10.

[54]  Alexander Schliep,et al.  Group testing with DNA chips: generating designs and decoding experiments , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[55]  David P. Woodruff,et al.  BPTree: an $\ell_2$ heavy hitters algorithm using constant memory , 2016 .

[56]  David P. Woodruff,et al.  Fast moment estimation in data streams in optimal space , 2010, STOC '11.

[57]  Mayank Bakshi,et al.  GROTESQUE: Noisy Group Testing (Quick and Efficient) , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[58]  Ding-Zhu Du,et al.  New Constructions of One- and Two-Stage Pooling Designs , 2008, J. Comput. Biol..

[59]  Mahdi Cheraghchi,et al.  Noise-resilient group testing: Limitations and constructions , 2008, Discret. Appl. Math..

[60]  Marios Hadjieleftheriou,et al.  Finding the frequent items in streams of data , 2009, CACM.

[61]  Daniel A. Spielman,et al.  Linear-time encodable and decodable error-correcting codes , 1995, STOC '95.

[62]  Krzysztof Onak,et al.  Sketching and Streaming Entropy via Approximation Theory , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[63]  S. Muthukrishnan,et al.  Group testing problems in experimental molecular biology , 1995, math/9505211.

[64]  David P. Woodruff New Algorithms for Heavy Hitters in Data Streams (Invited Talk) , 2016, ICDT.

[65]  Richard E. Ladner,et al.  Group testing for image compression , 2000, Proceedings DCC 2000. Data Compression Conference.

[66]  Mikhail J. Atallah,et al.  Indexing Information for Data Forensics , 2005, ACNS.

[67]  Vasileios Nakos,et al.  Almost optimal phaseless compressed sensing with sublinear decoding time , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[68]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[69]  Piotr Indyk,et al.  Sample-Optimal Fourier Sampling in Any Constant Dimension , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[70]  Leonard J. Schulman Coding for interactive communication , 1996, IEEE Trans. Inf. Theory.

[71]  Volkan Cevher,et al.  An adaptive sublinear-time block sparse fourier transform , 2017, STOC.

[72]  Junan Zhu,et al.  Noisy Pooled PCR for Virus Testing , 2020, bioRxiv.

[73]  Piotr Indyk,et al.  Combining geometry and combinatorics: A unified approach to sparse signal recovery , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[74]  Ameya Velingker,et al.  Dimension-independent Sparse Fourier Transform , 2019, SODA.

[75]  A. Sterrett On the Detection of Defective Members of Large Populations , 1957 .

[76]  Vasileios Nakos,et al.  Sublinear- Time Algorithms for Compressive Phase Retrieval , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[77]  D. Du,et al.  Pooling Designs And Nonadaptive Group Testing: Important Tools For Dna Sequencing , 2006 .

[78]  Frank K. Hwang,et al.  A survey on nonadaptive group testing algorithms through the angle of decoding , 2008, J. Comb. Optim..

[79]  Karl Bringmann,et al.  Top-𝑘-convolution and the quest for near-linear output-sensitive subset sum , 2020, STOC.

[80]  Ely Porat,et al.  For-All Sparse Recovery in Near-Optimal Time , 2014, ACM Trans. Algorithms.

[81]  N. J. A. Sloane,et al.  The On-Line Encyclopedia of Integer Sequences , 2003, Electron. J. Comb..

[82]  M. Sobel,et al.  Group testing to eliminate efficiently all defectives in a binomial sample , 1959 .

[83]  Miklós Ruszinkó,et al.  On the upper bound of the size of the r -cover-free families , 1994 .

[84]  Miklós Ruszinkó,et al.  On the Upper Bound of the Size of the R-Cover-Free Families , 1993, Proceedings. IEEE International Symposium on Information Theory.

[85]  Huy L. Nguyen,et al.  Sparsity lower bounds for dimensionality reducing maps , 2012, STOC '13.

[86]  Piotr Indyk,et al.  Sparse Recovery Using Sparse Matrices , 2010, Proceedings of the IEEE.

[87]  Atri Rudra,et al.  Efficiently decodable non-adaptive group testing , 2010, SODA '10.

[88]  Piotr Indyk,et al.  Nearly optimal sparse fourier transform , 2012, STOC '12.

[89]  R. Dorfman The Detection of Defective Members of Large Populations , 1943 .

[90]  David P. Woodruff,et al.  BPTree: An ℓ2 Heavy Hitters Algorithm Using Constant Memory , 2016, PODS.

[91]  Binbin Chen,et al.  Sublinear-Time Non-Adaptive Group Testing With O(k log n) Tests via Bit-Mixing Coding , 2019, IEEE Transactions on Information Theory.

[92]  Yael Mandel-Gutfreund,et al.  Evaluation of COVID-19 RT-qPCR test in multi-sample pools , 2020, medRxiv.

[93]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[94]  Anna C. Gilbert,et al.  Improved time bounds for near-optimal sparse Fourier representations , 2005, SPIE Optics + Photonics.

[95]  Atri Rudra,et al.  ℓ2/ℓ2-Foreach Sparse Recovery with Low Risk , 2013, ICALP.

[96]  M. A. Iwen,et al.  Improved Approximation Guarantees for Sublinear-Time Fourier Algorithms , 2010, ArXiv.

[97]  Joel A. Tropp,et al.  Algorithmic linear dimension reduction in the l_1 norm for sparse vectors , 2006, ArXiv.

[98]  Eric Price,et al.  A Fast Binary Splitting Approach to Non-Adaptive Group Testing , 2020, APPROX-RANDOM.

[99]  Mikhail Kapralov,et al.  Sample Efficient Estimation and Recovery in Sparse FFT via Isolation on Average , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[100]  Richard C. Singleton,et al.  Nonrandom binary superimposed codes , 1964, IEEE Trans. Inf. Theory.

[101]  Anna Pagh,et al.  Deterministic Radio Broadcasting , 2000, ICALP.

[102]  David P. Woodruff,et al.  On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation , 2012, APPROX-RANDOM.

[103]  Mahdi Cheraghchi,et al.  Improved Constructions for Non-adaptive Threshold Group Testing , 2010, Algorithmica.

[104]  David P. Woodruff,et al.  An Optimal Algorithm for l1-Heavy Hitters in Insertion Streams and Related Problems , 2016, PODS.

[105]  Ayfer Özgür,et al.  On the Optimality of the Kautz-Singleton Construction in Probabilistic Group Testing , 2019, IEEE Transactions on Information Theory.

[106]  Charles J. Colbourn,et al.  Handbook of Combinatorial Designs, Second Edition (Discrete Mathematics and Its Applications) , 2006 .

[107]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.