These Are Not the K-mers You Are Looking For: Efficient Online K-mer Counting Using a Probabilistic Data Structure
暂无分享,去创建一个
Qingpeng Zhang | J. Pell | Rosangela Canino-Koning | A. Howe | C. T. Brown | Adina C Howe | C. Brown | Adina Howe
[1] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.
[2] Li Fan,et al. Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.
[3] P. Pevzner,et al. An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.
[4] George Varghese,et al. New directions in traffic measurement and accounting , 2002, CCRV.
[5] M. Waterman,et al. Estimating the repeat structure and length of DNA sequences using L-tuples. , 2003, Genome research.
[6] Andrei Broder,et al. Network Applications of Bloom Filters: A Survey , 2004, Internet Math..
[7] Yossi Matias,et al. Spectral bloom filters , 2003, SIGMOD '03.
[8] Graham Cormode,et al. Summarizing and Mining Skewed Data Streams , 2005, SDM.
[9] Graham Cormode,et al. An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.
[10] P. Flajolet,et al. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .
[11] Brian E. Granger,et al. IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.
[12] E. Birney,et al. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.
[13] S. Kurtz,et al. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes , 2008, BMC Genomics.
[14] Florin Rusu,et al. Sketches for size of join estimation , 2008, TODS.
[15] Peter J. Woolf,et al. GAGE: generally applicable gene set enrichment for pathway analysis , 2009, BMC Bioinformatics.
[16] Wenfei Fan,et al. Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.
[17] Bryan O'Sullivan,et al. Using Bloom Filters for Large Scale Gene Sequence Analysis in Haskell , 2009, PADL.
[18] M. Metzker. Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.
[19] David R. Kelley,et al. Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.
[20] P. Bork,et al. A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.
[21] Páll Melsted,et al. Efficient counting of k-mers in DNA sequences using a bloom filter , 2011, BMC Bioinformatics.
[22] Thomas C. Conway,et al. Succinct data structures for assembling large genomes , 2010, Bioinform..
[23] P. Pevzner,et al. Efficient de novo assembly of single-cell bacterial genomes from short-read data sets , 2011, Nature Biotechnology.
[24] Paul Medvedev,et al. Error correction of high-throughput sequencing datasets with non-uniform coverage , 2011, Bioinform..
[25] Juliane C. Dohm,et al. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems , 2011, Genome Biology.
[26] Carl Kingsford,et al. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..
[27] Walter L. Ruzzo,et al. Compression of next-generation sequencing reads aided by highly efficient de novo assembly , 2012, Nucleic acids research.
[28] Brown C. Titus. What does Trinity's In Silico normalization do? , 2012 .
[29] S. Tringe,et al. Assembling large, complex environmental metagenomes , 2012, 1212.2832.
[30] Szymon Grabowski,et al. Disk-based k-mer counting on a PC , 2012, BMC Bioinformatics.
[31] Tim H. Brom,et al. A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data , 2012, 1203.4802.
[32] Arend Hintze,et al. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs , 2011, Proceedings of the National Academy of Sciences.
[33] Janet Jansson,et al. Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets , 2012, 1212.0159.
[34] Dominique Lavenier,et al. DSK: k-mer counting with very low memory usage , 2013, Bioinform..
[35] Tavish Armstrong,et al. The Performance of Open Source Applications , 2013 .
[36] Colin N. Dewey,et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.
[37] C. Titus Brown,et al. khmer: Working with Big Data in Bioinformatics , 2013, ArXiv.
[38] Ortiz-Zuazaga Humberto,et al. The khmer software package: enabling efficient sequence analysis , 2014 .
[39] Alexander Schliep,et al. Turtle: Identifying frequent k-mers with cache-efficient algorithms , 2013, Bioinform..
[40] Fredrik Vannberg,et al. KAnalyze: a fast versatile pipelined K-mer toolkit , 2014, Bioinform..
[41] Paul Medvedev,et al. Informed and automated k-mer size selection for genome assembly , 2013, Bioinform..
[42] S. Tringe,et al. Tackling soil diversity with the assembly of large, complex metagenomes , 2014, Proceedings of the National Academy of Sciences.