Combinatorial Group Testing and Sparse Recovery Schemes with Near-Optimal Decoding Time

In the long-studied problem of combinatorial group testing, one is asked to detect a set of $k$ defective items out of a population of size $n$, using $m\ll n$ disjunctive measurements. In the non-adaptive setting, the most widely used combinatorial objects are disjunct and list-disjunct matrices, which define incidence matrices of test schemes. Disjunct matrices allow the identification of the exact set of defectives, whereas list disjunct matrices identify a small superset of the defectives. Apart from the combinatorial guarantees, it is often of key interest to equip measurement designs with efficient decoding algorithms. The most efficient decoders should run in sublinear time in $n$, and ideally near-linear in the number of measurements $m$. In this work, we give several constructions with an optimal number of measurements and near-optimal decoding time for the most fundamental group testing tasks, as well as for central tasks in the compressed sensing and heavy hitters literature. For many of those tasks, the previous measurement-optimal constructions needed time either quadratic in the number of measurements or linear in the universe size. Among our results are the following: a construction of disjunct matrices matching the best-known construction in terms of the number of rows $m$, but achieving nearly linear decoding time in $m$; a construction of list disjunct matrices with the optimal $m=O(k\log(n/k)$ number of rows and nearly linear decoding time in $m$; error-tolerant variations of the above constructions; a non-adaptive group testing scheme for the “for-each” model with $m=O(k\log n)$ measurements and $O(m)$ decoding time; a streaming algorithm for the “for-all” version of the heavy hitters problem in the strict turnstile model with near-optimal query time, as well as a “list decoding” variant obtaining also near-optimal update time and $O(k\log(n/k))$ space usage; an $\ell_{2}/\ell_{2}$ weak identification system for compressed sensing with nearly optimal sample complexity and nearly linear decoding time in the sketch length. Most of our results are obtained via a clean and novel approach that avoids list-recoverable codes or related complex techniques that were present in almost every state-of-the-art work on efficiently decodable constructions of such objects.

[1]  Ely Porat,et al.  Search Methodologies , 2022 .

[2]  Anoosheh Heidarzadeh,et al.  On Accelerated Testing for COVID-19 Using Group Testing , 2020, ArXiv.

[3]  Eric Price,et al.  A Fast Binary Splitting Approach to Non-Adaptive Group Testing , 2020, APPROX-RANDOM.

[4]  Mikhail Kapralov,et al.  Sample Efficient Estimation and Recovery in Sparse FFT via Isolation on Average , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  Piotr Indyk,et al.  Sparse Recovery Using Sparse Matrices , 2010, Proceedings of the IEEE.

[6]  David P. Woodruff,et al.  On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation , 2012, APPROX-RANDOM.

[7]  Mahdi Cheraghchi,et al.  Noise-resilient group testing: Limitations and constructions , 2008, Discret. Appl. Math..

[8]  M. A. Iwen,et al.  Improved Approximation Guarantees for Sublinear-Time Fourier Algorithms , 2010, ArXiv.

[9]  David P. Woodruff,et al.  Fast moment estimation in data streams in optimal space , 2010, STOC '11.

[10]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[11]  David P. Woodruff,et al.  On Low-Risk Heavy Hitters and Sparse Recovery Schemes , 2017, APPROX-RANDOM.

[12]  N. J. A. Sloane,et al.  The On-Line Encyclopedia of Integer Sequences , 2003, Electron. J. Comb..

[13]  Krzysztof Onak,et al.  Sketching and Streaming Entropy via Approximation Theory , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[14]  Piotr Indyk,et al.  Sparse Recovery with Partial Support Knowledge , 2011, APPROX-RANDOM.

[15]  Vasileios Nakos,et al.  Sublinear- Time Algorithms for Compressive Phase Retrieval , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[16]  Noga Alon,et al.  Optimal Monotone Encodings , 2008, IEEE Transactions on Information Theory.

[17]  Vasileios Nakos,et al.  Stronger L2/L2 compressed sensing; without iterating , 2019, STOC.

[18]  Vasileios Nakos,et al.  On Fast Decoding of High-Dimensional Signals from One-Bit Measurements , 2016, ICALP.

[19]  Ely Porat,et al.  Sublinear time, measurement-optimal, sparse recovery for all , 2012, SODA.

[20]  David P. Woodruff,et al.  On the exact space complexity of sketching and streaming small norms , 2010, SODA '10.

[21]  Ding-Zhu Du,et al.  A survey on combinatorial group testing algorithms with applications to DNA Library Screening , 1999, Discrete Mathematical Problems with Medical Applications.

[22]  Mayank Bakshi,et al.  GROTESQUE: Noisy Group Testing (Quick and Efficient) , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  David P. Woodruff,et al.  Beating CountSketch for heavy hitters in insertion streams , 2015, STOC.

[24]  Yael Mandel-Gutfreund,et al.  Evaluation of COVID-19 RT-qPCR test in multi-sample pools , 2020, medRxiv.

[25]  Piotr Indyk,et al.  Sample-Optimal Fourier Sampling in Any Constant Dimension , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[26]  Krishna R. Narayanan,et al.  Group Testing using left-and-right-regular sparse-graph codes , 2017, ArXiv.

[27]  Ely Porat,et al.  For-All Sparse Recovery in Near-Optimal Time , 2014, ACM Trans. Algorithms.

[28]  Mahdi Cheraghchi,et al.  Simple Codes and Sparse Recovery with Fast Decoding , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[29]  Sampath Kannan,et al.  Group testing problems with sequences in experimental molecular biology , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[30]  Ayfer Özgür,et al.  On the Optimality of the Kautz-Singleton Construction in Probabilistic Group Testing , 2019, IEEE Transactions on Information Theory.

[31]  Ding-Zhu Du,et al.  Molecular Biology and Pooling Design , 2007 .

[32]  Anna Pagh,et al.  Deterministic Radio Broadcasting , 2000, ICALP.

[33]  Volkan Cevher,et al.  An adaptive sublinear-time block sparse fourier transform , 2017, STOC.

[34]  D. Balding,et al.  Efficient pooling designs for library screening. , 1994, Genomics.

[35]  Vasileios Nakos,et al.  Deterministic Heavy Hitters with Sublinear Query Time , 2017, APPROX-RANDOM.

[36]  Leonard J. Schulman Coding for interactive communication , 1996, IEEE Trans. Inf. Theory.

[37]  Kannan Ramchandran,et al.  SAFFRON: A fast, efficient, and robust framework for group testing based on sparse-graph codes , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[38]  Alexander Schliep,et al.  Group testing with DNA chips: generating designs and decoding experiments , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[39]  Huy L. Nguyen,et al.  Sparsity lower bounds for dimensionality reducing maps , 2012, STOC '13.

[40]  Frank K. Hwang,et al.  A survey on nonadaptive group testing algorithms through the angle of decoding , 2008, J. Comb. Optim..

[41]  David P. Woodruff,et al.  Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows , 2018, APPROX-RANDOM.

[42]  Joel A. Tropp,et al.  Algorithmic linear dimension reduction in the l_1 norm for sparse vectors , 2006, ArXiv.

[43]  Ayfer Özgür,et al.  On the Optimality of the Kautz-Singleton Construction in Probabilistic Group Testing , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[44]  Venkatesan Guruswami,et al.  Linear-Time List Decoding in Error-Free Settings: (Extended Abstract) , 2004, ICALP.

[45]  D. Du,et al.  Combinatorial Group Testing and Its Applications , 1993 .

[46]  S. Muthukrishnan,et al.  Group testing problems in experimental molecular biology , 1995, math/9505211.

[47]  Jack K. Wolf,et al.  Born again group testing: Multiaccess communications , 1985, IEEE Trans. Inf. Theory.

[48]  Andrei Z. Broder,et al.  A Note on Double Pooling Tests , 2020, ArXiv.

[49]  Atri Rudra,et al.  Efficiently Decodable Error-Correcting List Disjunct Matrices and Applications - (Extended Abstract) , 2011, ICALP.

[50]  Piotr Indyk,et al.  (Learned) Frequency Estimation Algorithms under Zipfian Distribution , 2019, ArXiv.

[51]  Mikhail J. Atallah,et al.  Indexing Information for Data Forensics , 2005, ACNS.

[52]  Vasileios Nakos,et al.  Almost optimal phaseless compressed sensing with sublinear decoding time , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[53]  Ding-Zhu Du,et al.  New Constructions of One- and Two-Stage Pooling Designs , 2008, J. Comput. Biol..

[54]  Ely Porat,et al.  k -Mismatch with Don't Cares , 2007, ESA.

[55]  Ding-Zhu Du,et al.  An unexpected meeting of four seemingly unrelated problems: graph testing, DNA complex screening, superimposed codes and secure key distribution , 2007, J. Comb. Optim..

[56]  M. Sobel,et al.  Group testing to eliminate efficiently all defectives in a binomial sample , 1959 .

[57]  Amin Karbasi,et al.  Graph-Constrained Group Testing , 2010, IEEE Transactions on Information Theory.

[58]  Moni Naor,et al.  Deterministic History-Independent Strategies for Storing Information on Write-Once Memories , 2007, Theory Comput..

[59]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[60]  Richard E. Ladner,et al.  Group testing for image compression , 2000, Proceedings DCC 2000. Data Compression Conference.

[61]  Miklós Ruszinkó,et al.  On the Upper Bound of the Size of the R-Cover-Free Families , 1993, Proceedings. IEEE International Symposium on Information Theory.

[62]  David P. Woodruff,et al.  Perfect Lp Sampling in a Data Stream , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[63]  Piotr Indyk,et al.  Nearly Optimal Deterministic Algorithm for Sparse Walsh-Hadamard Transform , 2015, SODA.

[64]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[65]  Zhao Song,et al.  Solving tall dense linear programs in nearly linear time , 2020, STOC.

[66]  A. Sterrett On the Detection of Defective Members of Large Populations , 1957 .

[67]  Sidharth Jaggi,et al.  Nearly optimal sparse group testing , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[68]  Mahdi Cheraghchi,et al.  Improved Constructions for Non-adaptive Threshold Group Testing , 2010, Algorithmica.

[69]  David P. Woodruff,et al.  BPTree: an $\ell_2$ heavy hitters algorithm using constant memory , 2016 .

[70]  Junan Zhu,et al.  Noisy Pooled PCR for Virus Testing , 2020, bioRxiv.

[71]  Mikhail Kapralov,et al.  Sparse fourier transform in any constant dimension with nearly-optimal sample complexity in sublinear time , 2016, STOC.

[72]  David P. Woodruff,et al.  BPTree: An ℓ2 Heavy Hitters Algorithm Using Constant Memory , 2016, PODS.

[73]  R. Gregory Taylor,et al.  Modern computer algebra , 2002, SIGA.

[74]  David P. Woodruff,et al.  An Optimal Algorithm for ℓ1-Heavy Hitters in Insertion Streams and Related Problems , 2018, ACM Trans. Algorithms.

[75]  Huacheng Yu,et al.  Faster Update Time for Turnstile Streaming Algorithms , 2019, SODA.

[76]  Christos Tzamos,et al.  Fast Modular Subset Sum using Linear Sketching , 2018, SODA.

[77]  Atri Rudra,et al.  Efficiently decodable non-adaptive group testing , 2010, SODA '10.

[78]  Vasileios Nakos,et al.  (Nearly) Sample-Optimal Sparse Fourier Transform in Any Dimension; RIPless and Filterless , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[79]  Piotr Indyk,et al.  Nearly optimal sparse fourier transform , 2012, STOC '12.

[80]  A. Macula Probabilistic nonadaptive group testing in the presence of errors and DNA library screening , 1999 .

[81]  Mayank Bakshi,et al.  Efficient Algorithms for Noisy Group Testing , 2017, IEEE Transactions on Information Theory.

[82]  Piotr Indyk,et al.  Simple and practical algorithm for sparse Fourier transform , 2012, SODA.

[83]  Mary Wootters,et al.  Unconstraining graph-constrained group testing , 2018, APPROX-RANDOM.

[84]  Charles J. Colbourn,et al.  Handbook of Combinatorial Designs, Second Edition (Discrete Mathematics and Its Applications) , 2006 .

[85]  Salil P. Vadhan,et al.  The unified theory of pseudorandomness , 2010 .

[86]  Atri Rudra,et al.  ℓ2/ℓ2-Foreach Sparse Recovery with Low Risk , 2013, ICALP.

[87]  Mikkel Thorup,et al.  Heavy Hitters via Cluster-Preserving Clustering , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[88]  Richard C. Singleton,et al.  Nonrandom binary superimposed codes , 1964, IEEE Trans. Inf. Theory.

[89]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[90]  David P. Woodruff New Algorithms for Heavy Hitters in Data Streams (Invited Talk) , 2016, ICDT.

[91]  Ely Porat,et al.  Approximate sparse recovery: optimizing time and measurements , 2009, STOC '10.

[92]  V. V. Rykov,et al.  Superimposed distance codes , 1989 .

[93]  Karl Bringmann,et al.  Top-𝑘-convolution and the quest for near-linear output-sensitive subset sum , 2020, STOC.

[94]  Daniel A. Spielman,et al.  Linear-time encodable and decodable error-correcting codes , 1995, STOC '95.

[95]  Ameya Velingker,et al.  Dimension-independent Sparse Fourier Transform , 2019, SODA.

[96]  Weili Wu,et al.  On error-tolerant DNA screening , 2006, Discret. Appl. Math..

[97]  Piotr Indyk,et al.  Combining geometry and combinatorics: A unified approach to sparse signal recovery , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[98]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[99]  Anna C. Gilbert,et al.  Improved time bounds for near-optimal sparse Fourier representations , 2005, SPIE Optics + Photonics.

[100]  Holger Rauhut,et al.  A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[101]  Binbin Chen,et al.  Sublinear-Time Non-Adaptive Group Testing With O(k log n) Tests via Bit-Mixing Coding , 2019, IEEE Transactions on Information Theory.

[102]  Arkadii G. D'yachkov,et al.  A survey of superimposed code theory , 1983 .

[103]  Marios Hadjieleftheriou,et al.  Finding the frequent items in streams of data , 2009, CACM.

[104]  Miklós Ruszinkó,et al.  On the upper bound of the size of the r -cover-free families , 1994 .

[105]  R. Dorfman The Detection of Defective Members of Large Populations , 1943 .

[106]  David P. Woodruff,et al.  An Optimal Algorithm for l1-Heavy Hitters in Insertion Streams and Related Problems , 2016, PODS.

[107]  D. Du,et al.  Pooling Designs And Nonadaptive Group Testing: Important Tools For Dna Sequencing , 2006 .