Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons
暂无分享,去创建一个
Ken Youens-Clark | Bonnie L Hurwitz | Alise J Ponsero | Illyoung Choi | John H Hartman | Matthew Bomhoff | J. Hartman | B. Hurwitz | K. Youens-Clark | A. Ponsero | Matthew Bomhoff | Illyoung Choi
[1] Matthew B. Sullivan,et al. The Pacific Ocean Virome (POV): A Marine Viral Metagenomic Dataset and Associated Protein Clusters for Quantitative Viral Ecology , 2013, PloS one.
[2] Anna-Lan Huang,et al. Similarity Measures for Text Document Clustering , 2008 .
[3] Pavan Balaji,et al. Bloomfish: A Highly Scalable Distributed K-mer Counting Framework , 2017, 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS).
[4] Steven B Cannon,et al. Bringing your tools to CyVerse Discovery Environment using Docker , 2016, F1000Research.
[5] Andrew Zisserman,et al. Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.
[6] S. Quake,et al. Dissecting biological “dark matter” with single-cell genetic analysis of rare and uncultivated TM7 microbes from the human mouth , 2007, Proceedings of the National Academy of Sciences.
[7] S. Kurtz,et al. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes , 2008, BMC Genomics.
[8] Natalia N. Ivanova,et al. Insights into the phylogeny and coding potential of microbial dark matter , 2013, Nature.
[9] Antti Honkela,et al. Exploration and retrieval of whole-metagenome sequencing samples , 2013, Bioinform..
[10] M. Michie. Use of the Bray-Curtis similarity measure in cluster analysis of foraminiferal data , 1982 .
[11] Dmitry G. Alexeev,et al. MetaFast: fast reference-free graph-based comparison of shotgun metagenomic data , 2016, Bioinform..
[12] Bonnie L Hurwitz,et al. Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses , 2014, Proceedings of the National Academy of Sciences.
[13] Bonnie L Hurwitz,et al. Depth-stratified functional and taxonomic niche specialization in the ‘core’ and ‘flexible’ Pacific Ocean Virome , 2014, The ISME Journal.
[14] Brian C. Thomas,et al. A new view of the tree of life , 2016, Nature Microbiology.
[15] M. Diepenbroek,et al. PANGAEA: an information system for environmental sciences , 2002 .
[16] Yu-Wei Wu,et al. A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples , 2010, RECOMB.
[17] Luiz Irber,et al. sourmash: a library for MinHash sketching of DNA , 2016, J. Open Source Softw..
[18] Jianhua Lin,et al. Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.
[19] B. Hurwitz,et al. 16S rRNA gene sequencing on a benchtop sequencer: accuracy for identification of clinically important bacteria , 2017, Journal of applied microbiology.
[20] Robert C. Edgar,et al. BIOINFORMATICS APPLICATIONS NOTE , 2001 .
[21] P. Bork,et al. Patterns and ecological drivers of ocean viral communities , 2015, Science.
[22] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[23] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[24] Weisong Shi,et al. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping , 2011, BMC Research Notes.
[25] M. DePristo,et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.
[26] Dmitry S. Ischenko,et al. Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis , 2016, BMC Bioinformatics.
[27] Hooman Zabeti,et al. IMPROVING MIN HASH VIA THE CONTAINMENT INDEX WITH APPLICATIONS TO METAGENOMIC ANALYSIS , 2017 .
[28] T. Thomas,et al. GemSIM: general, error-model based simulator of next-generation sequencing data , 2012, BMC Genomics.
[29] Michael C. Schatz,et al. Rapid parallel genome indexing with MapReduce , 2011, MapReduce '11.
[30] Benjamin J. Raphael,et al. The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.
[31] G. Bratbak,et al. High abundance of viruses found in aquatic environments , 1989, Nature.
[32] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.
[33] Stéphane Le Crom,et al. Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses , 2012, Bioinform..
[34] Kai Wang,et al. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data , 2013, Bioinform..
[35] Frank Oliver Glöckner,et al. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences , 2004, BMC Bioinformatics.
[36] Joshua M. Stuart,et al. The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.
[37] Luis Pedro Coelho,et al. Structure and function of the global ocean microbiome , 2015, Science.
[38] B. Langmead,et al. Cloud-scale RNA-sequencing differential expression analysis with Myrna , 2010, Genome Biology.
[39] Christian Schlötterer,et al. DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster , 2013, PloS one.
[40] Frederic D. Bushman,et al. Conservation of Gene Cassettes among Diverse Viruses of the Human Gut , 2012, PloS one.
[41] Brian D. Ondov,et al. Mash: fast genome and metagenome distance estimation using MinHash , 2015, Genome Biology.
[42] Katherine H. Huang,et al. Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.
[43] Xiaoyu Wang,et al. A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis , 2012, Briefings Bioinform..
[44] Limin Fu,et al. Artificial and natural duplicates in pyrosequencing reads of metagenomic data , 2010, BMC Bioinformatics.
[45] Winston Haynes,et al. Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins. , 2011, Omics : a journal of integrative biology.
[46] Peter J. Tonellato,et al. Cloud computing for comparative genomics , 2010, BMC Bioinformatics.
[47] Michael C. Schatz,et al. CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..
[48] Dominique Lavenier,et al. Compareads: comparing huge metagenomic experiments , 2012, BMC Bioinformatics.
[49] Dominique Lavenier,et al. Commet: Comparing and combining multiple metagenomic datasets , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
[50] Yunpeng Cai,et al. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time , 2011, Nucleic acids research.
[51] Dominique Lavenier,et al. Multiple comparative metagenomics using multiset k-mer counting , 2016, PeerJ Comput. Sci..
[52] M. Schatz,et al. Searching for SNPs with cloud computing , 2009, Genome Biology.
[53] Shujiro Okuda,et al. Virtual metagenome reconstruction from 16S rRNA gene sequences , 2012, Nature Communications.
[54] Yi Luo,et al. How independent are the appearances of n-mers in different genomes? , 2004, Bioinform..
[55] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[56] Shaoliang Peng,et al. Bioinformatics applications on Apache Spark , 2018, GigaScience.
[57] Sven Rahmann,et al. SimLoRD: Simulation of Long Read Data , 2016, Bioinform..
[58] B. S. Manjunath,et al. The iPlant Collaborative: Cyberinfrastructure for Plant Biology , 2011, Front. Plant Sci..