CloudGT: A High Performance Genome Analysis Toolkit Leveraging Pipeline Optimization on Spark
暂无分享,去创建一个
Zongze Wu | Shoubin Dong | Cheng Liu | Lingqi Zhang | Anghong Xiao | Zongze Wu | Shoubin Dong | Lingqi Zhang | Anghong Xiao | Cheng Liu
[1] Richard Durbin,et al. Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .
[2] Elaine R. Mardis,et al. A decade’s perspective on DNA sequencing technology , 2011, Nature.
[3] Hamid Mushtaq,et al. Cluster-based Apache Spark implementation of the GATK DNA analysis pipeline , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
[4] Peter White,et al. Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics , 2015, Genome Biology.
[5] Gonçalo R. Abecasis,et al. The variant call format and VCFtools , 2011, Bioinform..
[6] Reynold Xin,et al. Apache Spark , 2016 .
[7] Monya Baker,et al. Next-generation sequencing: adjusting to data overload , 2010, Nature Methods.
[8] H. Peter Hofstee,et al. SparkGA: A Spark Framework for Cost Effective, Fast and Accurate DNA Analysis at Scale , 2017, BCB.
[9] Jan Fostier,et al. Halvade: scalable sequence analysis with MapReduce , 2015, Bioinform..
[10] Shamim Reza,et al. The Rise of Big Data and Cloud Computing , 2019 .
[11] Xu Li,et al. Accelerating large-scale genomic analysis with Spark , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
[12] Mahidhar Tatineni,et al. Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies , 2015, BMC Bioinformatics.
[13] Gonçalo R. Abecasis,et al. The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..
[14] Mauricio O. Carneiro,et al. From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.
[15] Vijay S. Kumar,et al. A highly parallel next-generation DNA sequencing data analysis pipeline in Hadoop , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
[16] Youliang Yan,et al. HiGene: A high-performance platform for genomic data analysis , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
[17] Walter L. Ruzzo,et al. Compression of next-generation sequencing reads aided by highly efficient de novo assembly , 2012, Nucleic acids research.