Finer Metagenomic Reconstruction via Biodiversity Optimization

When analyzing communities of microorganisms from their sequenced DNA, an important task is taxonomic profiling: enumerating the presence and relative abundance of all organisms, or merely of all taxa, contained in the sample. This task can be tackled via compressive-sensing-based approaches, which favor communities featuring the fewest organisms among those consistent with the observed DNA data. Despite their successes, these parsimonious approaches sometimes conflict with biological realism by overlooking organism similarities. Here, we leverage a recently developed notion of biological diversity that simultaneously accounts for organism similarities and retains the optimization strategy underlying compressive-sensing-based approaches. We demonstrate that minimizing biological diversity still produces sparse taxonomic profiles and we experimentally validate superiority to existing compressive-sensing-based approaches. Despite showing that the objective function is almost never convex and often concave, generally yielding NP-hard problems, we exhibit ways of representing organism similarities for which minimizing diversity can be performed via a sequence of linear programs guaranteed to decrease diversity. Better yet, when biological similarity is quantified by k-mer co-occurrence (a popular notion in bioinformatics), minimizing diversity actually reduces to one linear program that can utilize multiple k-mer sizes to enhance performance. In proof-of-concept experiments, we verify that the latter procedure can lead to significant gains when taxonomically profiling a metagenomic sample, both in terms of reconstruction accuracy and computational performance. Reproducible code is available at https://github.com/dkoslicki/MinimizeBiologicalDiversity.

[1]  Philip D. Blood,et al.  Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software , 2017, Nature Methods.

[2]  Bernhard Y. Renard,et al.  DUDes: a top-down taxonomic profiler for metagenomics , 2016, Bioinform..

[3]  Simon Foucart,et al.  WGSQuikr: Fast Whole-Genome Shotgun Metagenomic Classification , 2014, PloS one.

[4]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[5]  Mikael Skoglund,et al.  SEK: sparsity exploiting k-mer-based estimation of bacterial community composition , 2014, Bioinform..

[6]  Simon Foucart,et al.  Sparse Recovery by Means of Nonnegative Least Squares , 2014, IEEE Signal Processing Letters.

[7]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[9]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[10]  Tom Leinster,et al.  Measuring diversity: the importance of species similarity. , 2012, Ecology.

[11]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[12]  Bas E. Dutilh,et al.  FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares , 2014, PeerJ.

[13]  Yinyu Ye,et al.  A note on the complexity of Lp minimization , 2011, Math. Program..

[14]  Gail L. Rosen,et al.  Quikr: a method for rapid reconstruction of bacterial communities via compressive sensing , 2013, Bioinform..

[15]  S. Foucart,et al.  Sparsest solutions of underdetermined linear systems via ℓq-minimization for 0 , 2009 .

[16]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[17]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[18]  Daniel Falush,et al.  MetaPalette: a k-mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation , 2016, mSystems.

[19]  Paul P. Gardner,et al.  An evaluation of the accuracy and speed of metagenome analysis tools , 2015, Scientific Reports.