When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data
暂无分享,去创建一个
[1] Daniel N. Baker,et al. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts , 2018, Genome Biology.
[2] Hooman Zabeti,et al. Improving MinHash via the containment index with applications to metagenomic analysis , 2019, Appl. Math. Comput..
[3] Xiaoyong Du,et al. Persistent Data Sketching , 2015, SIGMOD Conference.
[4] Justin Chu,et al. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter , 2016, bioRxiv.
[5] Robert Nowak,et al. De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application , 2018, BMC Bioinformatics.
[6] Michael Mitzenmacher,et al. Less Hashing, Same Performance: Building a Better Bloom Filter , 2006, ESA.
[7] Carl Kingsford,et al. Sketching and Sublinear Data Structures in Genomics , 2019, Annual Review of Biomedical Data Science.
[8] Brian D. Ondov,et al. Mash: fast genome and metagenome distance estimation using MinHash , 2015, Genome Biology.
[9] Paul Medvedev,et al. Informed and automated k-mer size selection for genome assembly , 2013, Bioinform..
[10] XiaoFei Zhao,et al. BinDash, software for fast genome distance estimation on a typical personal laptop , 2018, Bioinform..
[11] Walter L. Ruzzo,et al. Compression of next-generation sequencing reads aided by highly efficient de novo assembly , 2012, Nucleic acids research.
[12] Carl Kingsford,et al. Fast Search of Thousands of Short-Read Sequencing Experiments , 2015, Nature Biotechnology.
[13] Andrei Z. Broder,et al. Identifying and Filtering Near-Duplicate Documents , 2000, CPM.
[14] Luca Trevisan,et al. Counting Distinct Elements in a Data Stream , 2002, RANDOM.
[15] Philippe Flajolet,et al. Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..
[16] B. Langmead,et al. Lighter: fast and memory-efficient sequencing error correction without counting , 2014, Genome Biology.
[17] Ananth Kalyanaraman,et al. FastEtch: A Fast Sketch-Based Assembler for Genomes , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[18] Bin Li,et al. HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift , 2017, 2017 IEEE International Conference on Data Mining (ICDM).
[19] Michael Roberts,et al. Reducing storage requirements for biological sequence comparison , 2004, Bioinform..
[20] Anna Paola Carrieri,et al. Streaming histogram sketching for rapid microbiome analytics , 2018, bioRxiv.
[21] Serafim Batzoglou,et al. A hybrid cloud read aligner based on MinHash and kmer voting that preserves privacy , 2017, Nature Communications.
[22] Prashant Pandey,et al. Locality-sensitive hashing for the edit distance , 2019, bioRxiv.
[23] Alexander Hall,et al. HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm , 2013, EDBT '13.
[24] Li Fan,et al. Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.
[25] Srinivas Aluru,et al. A Fast Adaptive Algorithm for Computing Whole-Genome Homology Maps , 2018 .
[26] Michael A. Bender,et al. deBGR: an efficient and near-exact representation of the weighted de Bruijn graph , 2017, Bioinform..
[27] Bin Fan,et al. Cuckoo Filter: Practically Better Than Bloom , 2014, CoNEXT.
[28] Michael A. Bender,et al. Squeakr: An Exact and Approximate k-mer Counting System , 2017, bioRxiv.
[29] Hamid Mohamadi,et al. ntCard: a streaming algorithm for cardinality estimation in genomics data , 2017, Bioinform..
[30] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[31] Sergey Koren,et al. Mash Screen: high-throughput sequence containment estimation for genome discovery , 2019, Genome Biology.
[32] Yongge Wang,et al. Randomization and Approximation Techniques in Computer Science , 1997, Lecture Notes in Computer Science.
[33] Daniel N. Baker,et al. Dashing: fast and accurate genomic distances with HyperLogLog , 2018, Genome Biology.
[34] Chirag Jain,et al. A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases , 2017, RECOMB.
[35] Michael A. Bender,et al. A General-Purpose Counting Filter: Making Every Bit Count , 2017, SIGMOD Conference.
[36] Yongchao Liu,et al. Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data , 2013, Bioinform..
[37] Xiaolong Wu,et al. BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads , 2014, Bioinform..
[38] Anna Paola Carrieri,et al. A Fast Machine Learning Workflow for Rapid Phenotype Prediction from Whole Shotgun Metagenomes , 2019, AAAI.
[39] Huzefa Rangwala,et al. MC-MinH: Metagenome Clustering using Minwise based Hashing , 2013, SDM.
[40] G. Smith,et al. Rapid bacterial whole-genome sequencing to enhance diagnostic and public health microbiology. , 2013, JAMA internal medicine.
[41] Páll Melsted,et al. Efficient counting of k-mers in DNA sequences using a bloom filter , 2011, BMC Bioinformatics.
[42] Graham Cormode,et al. Data Sketching , 2017, ACM Queue.
[43] Wael Hassan Gomaa,et al. A Survey of Text Similarity Approaches , 2013 .
[44] Daniel Standage,et al. The khmer software package: enabling efficient nucleotide sequence analysis , 2015, F1000Research.
[45] Peter J. Haas,et al. On synopses for distinct-value estimation under multiset operations , 2007, SIGMOD '07.
[46] Edith Cohen,et al. Summarizing data using bottom-k sketches , 2007, PODC '07.
[47] Graham Cormode,et al. An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.
[48] Ehsan Eydi,et al. Buffered Count-Min Sketch , 2017 .
[49] Edith Cohen,et al. Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..
[50] Ping Li,et al. b-Bit minwise hashing , 2009, WWW '10.
[51] Chirag Jain,et al. A fast adaptive algorithm for computing whole-genome homology maps , 2018, bioRxiv.
[52] Georges Hébrail,et al. Sliding HyperLogLog: Estimating Cardinality in a Data Stream over a Sliding Window , 2010, 2010 IEEE International Conference on Data Mining Workshops.
[53] J. Landolin,et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.
[54] Michael A. Bender,et al. Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index. , 2018, Cell systems.
[55] Will P. M. Rowe,et al. Indexed variation graphs for efficient and accurate resistome profiling , 2018, bioRxiv.
[56] Phelim Bradley,et al. Ultra-fast search of all deposited bacterial and viral genomic data , 2019, Nature Biotechnology.
[57] Yanling Lin,et al. Sequences Dimensionality-Reduction by K-mer Substring Space Sampling Enables Effective Resemblance- and Containment-Analysis for Large-Scale omics-data , 2019, bioRxiv.
[58] Qingpeng Zhang,et al. These Are Not the K-mers You Are Looking For: Efficient Online K-mer Counting Using a Probabilistic Data Structure , 2013, PloS one.
[59] Heng Li,et al. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..
[60] P. Flajolet,et al. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .
[61] David A. Patterson,et al. Attack of the killer microseconds , 2017, Commun. ACM.
[62] Jordan A. Fish,et al. Xander: employing a novel method for efficient gene-targeted metagenomic assembly , 2015, Microbiome.
[63] Graham Cormode,et al. Data sketching , 2017, Commun. ACM.
[64] Heng Li,et al. Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..
[65] Tim Head,et al. Binder 2.0 - Reproducible, interactive, sharable environments for science at scale , 2018, SciPy.
[66] Hilde van der Togt,et al. Publisher's Note , 2003, J. Netw. Comput. Appl..
[67] Ryan P. Adams,et al. A Bayesian Nonparametric View on Count-Min Sketch , 2018, NeurIPS.
[68] Rayan Chikhi,et al. Space-efficient and exact de Bruijn graph representation based on a Bloom filter , 2012, Algorithms for Molecular Biology.
[69] Roderick Bovee,et al. Finch: a tool adding dynamic abundance filtering to genomic MinHashing , 2018, J. Open Source Softw..
[70] Brian Bushnell,et al. BBMap: A Fast, Accurate, Splice-Aware Aligner , 2014 .
[71] Ping Li,et al. One Permutation Hashing , 2012, NIPS.
[72] Prashant Pandey,et al. Locality-sensitive hashing for the edit distance , 2019, Bioinform..
[73] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.
[74] Luiz Irber,et al. sourmash: a library for MinHash sketching of DNA , 2016, J. Open Source Softw..
[75] Leonid Oliker,et al. Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[76] Srinivas Aluru,et al. A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases , 2017, bioRxiv.