Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files
暂无分享,去创建一个
Ingo Ruczinski | Fusheng Wang | Jingjing Gao | Zhaohui S. Qin | Zhaohui S Qin | Terri H Beaty | Xiaobo Sun | Peng Jin | Celeste Eng | Esteban G Burchard | Rasika A Mathias | Kathleen Barnes | Z. Qin | T. Beaty | E. Burchard | I. Ruczinski | P. Jin | K. Barnes | Jingjing Gao | R. Mathias | C. Eng | Fusheng Wang | Xiaobo Sun
[1] Zachary A. Szpiech,et al. A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome , 2016, Nature Communications.
[2] Karl Gruber. Google for genomes , 2014, Nature Biotechnology.
[3] Life Technologies,et al. A map of human genome variation from population-scale sequencing , 2011 .
[4] M. Balazinska,et al. A Study of Skew in MapReduce Applications , 2011 .
[5] Fabian A. Buske,et al. VariantSpark: population scale clustering of genotype information , 2015, BMC Genomics.
[6] Ivan Merelli,et al. Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives , 2014, BioMed research international.
[7] Tom White,et al. Hadoop: The Definitive Guide , 2009 .
[8] Davide Anguita,et al. Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf , 2015, INNS Conference on Big Data.
[9] Hui Guo,et al. VSEAMS: a pipeline for variant set enrichment analysis using summary GWAS data identifies IKZF3, BATF and ESRRA as key transcription factors in type 1 diabetes , 2014, Bioinform..
[10] Yike Guo,et al. CGDM: collaborative genomic data model for molecular profiling data using NoSQL , 2016, Bioinform..
[11] Peggy L Peissig,et al. SeqHBase: a big data toolset for family based sequencing data analysis , 2015, Journal of Medical Genetics.
[12] John L. Gustafson,et al. Reevaluating Amdahl's law , 1988, CACM.
[13] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.
[14] Gianluigi Zanetti,et al. SEAL: a distributed short read mapping and duplicate removal tool , 2011, Bioinform..
[15] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[16] Gonçalo R. Abecasis,et al. The variant call format and VCFtools , 2011, Bioinform..
[17] M. N. Vora,et al. Hadoop-HBase for large-scale data , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.
[18] Wei Zhou,et al. MetaSpark: a spark‐based distributed processing tool to recruit metagenomic reads to reference genomes , 2017, Bioinform..
[19] Eija Korpelainen,et al. Hadoop-BAM: directly manipulating next generation sequencing data in the cloud , 2012, Bioinform..
[20] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[21] Zhengwei Zhu,et al. FR-HIT, a very fast program to recruit metagenomic reads to homologous reference genomes , 2011, Bioinform..
[22] Patrick E. O'Neil,et al. The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.
[23] Hao Wu,et al. A novel statistical method for quantitative comparison of multiple ChIP-seq datasets , 2015, Bioinform..
[24] Emad A. Mohammed,et al. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends , 2014, BioData Mining.
[25] Philippe Flajolet,et al. An introduction to the analysis of algorithms , 1995 .
[26] Marek S. Wiewiórka,et al. SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision , 2014, Bioinform..
[27] M. Schatz,et al. Searching for SNPs with cloud computing , 2009, Genome Biology.
[28] Heng Li,et al. Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..
[29] M. DePristo,et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.
[30] D. Altshuler,et al. A map of human genome variation from population-scale sequencing , 2010, Nature.
[31] Sandeep Tata,et al. BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters , 2013, Bioinform..
[32] Patrick Valduriez,et al. Principles of Distributed Database Systems , 1990 .
[33] Ola Spjuth,et al. A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data , 2015, GigaScience.
[34] David A. Patterson,et al. ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing , 2013 .
[35] Michael C. Schatz,et al. CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..
[36] Manuel A. R. Ferreira,et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.
[37] Donald. Miner,et al. MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems , 2012 .
[38] Y. Danieli. Guide , 2005 .
[39] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.
[40] Jan Fostier,et al. Halvade: scalable sequence analysis with MapReduce , 2015, Bioinform..