MaRe: Processing Big Data with application containers on Apache Spark
暂无分享,去创建一个
Ola Spjuth | Marco Capuccini | Salman Toor | Martin Dahlö | Marco Capuccini | O. Spjuth | S. Toor | M. Dahlö
[1] Randy H. Katz,et al. Above the Clouds: A Berkeley View of Cloud Computing , 2009 .
[2] Arthur Dalby,et al. Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..
[3] Shaoliang Peng,et al. Bioinformatics applications on Apache Spark , 2018, GigaScience.
[4] et al.,et al. Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.
[5] M. DePristo,et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.
[6] Reynold Xin,et al. Apache Spark , 2016 .
[7] Hanchuan Peng,et al. Bioimage informatics: a new area of engineering biology , 2008, Bioinform..
[8] Gabor T. Marth,et al. A global reference for human genetic variation , 2015, Nature.
[9] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[10] Alejandra N. González-Beltrán,et al. PhenoMeNal: processing and analysis of metabolomics data in the cloud , 2018, bioRxiv.
[11] F. Collins,et al. Shattuck lecture--medical and societal consequences of the Human Genome Project. , 1999, The New England journal of medicine.
[12] George Papadatos,et al. SureChEMBL: a large-scale, chemically annotated patent document database , 2015, Nucleic Acids Res..
[13] K Osterlund,et al. Unexpected binding mode of a cyclic sulfamide HIV-1 protease inhibitor. , 1997, Journal of medicinal chemistry.
[14] Milind A. Bhandarkar,et al. MapReduce programming with apache Hadoop , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[15] M. Schatz,et al. Big Data: Astronomical or Genomical? , 2015, PLoS biology.
[16] L. Kruglyak. Prospects for whole-genome linkage disequilibrium mapping of common disease genes , 1999, Nature Genetics.
[17] Gonçalo R. Abecasis,et al. The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..
[18] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.
[19] Jeremy Leipzig,et al. A review of bioinformatic pipeline frameworks , 2016, Briefings Bioinform..
[20] Ola Spjuth,et al. Tracking the NGS revolution: managing life science research on shared high-performance computing clusters , 2018, GigaScience.
[21] Robert Stevens,et al. A Survey of Bioinformatics Database and Software Usage through Mining the Literature , 2016, PloS one.
[22] Emad A. Mohammed,et al. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends , 2014, BioData Mining.
[23] Rajkumar Buyya,et al. Data Storage Management in Cloud Environments , 2017, ACM Comput. Surv..
[24] Ryan G. Coleman,et al. ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..
[25] Ola Spjuth,et al. Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles , 2016, Journal of Cheminformatics.
[26] Martin Odersky,et al. An Overview of the Scala Programming Language , 2004 .
[27] Günther Specht,et al. Cloudgene: A graphical execution platform for MapReduce programs on private and public clouds , 2012, BMC Bioinformatics.
[28] Ross Ihaka,et al. Gentleman R: R: A language for data analysis and graphics , 1996 .
[29] Ulysses G. J. Balis,et al. The growing need for microservices in bioinformatics , 2016, Journal of pathology informatics.
[30] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[31] Ola Spjuth,et al. Large-scale virtual screening on public cloud resources with Apache Spark , 2017, Journal of Cheminformatics.
[32] Ching-Hsien Hsu,et al. GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers , 2017, Computing.
[33] Xiaoqiao Meng,et al. Delay tails in MapReduce scheduling , 2012, SIGMETRICS '12.
[34] Yanli Wang,et al. Structure-Based Virtual Screening for Drug Discovery: a Problem-Centric Review , 2012, The AAPS Journal.
[35] Geoffrey C. Fox,et al. MapReduce in the Clouds for Science , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.
[36] Ola Spjuth,et al. Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud , 2018, bioRxiv.
[37] Mark McGann,et al. FRED Pose Prediction and Virtual Screening Accuracy , 2011, J. Chem. Inf. Model..
[38] Leonard J. Foster,et al. At the Intersection of Proteomics and Big Data Science. , 2017, Clinical chemistry.
[39] Paolo Di Tommaso,et al. Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.
[40] Peter M. Rice,et al. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.
[41] Ola Spjuth,et al. SNIC Science Cloud (SSC): A National-Scale Cloud Infrastructure for Swedish Academia , 2017, 2017 IEEE 13th International Conference on e-Science (e-Science).
[42] Robert C. Elston,et al. Defining “mutation” and “polymorphism” in the era of personal genomics , 2015, BMC Medical Genomics.
[43] Ola Spjuth,et al. Efficient iterative virtual screening with Apache Spark and conformal prediction , 2018, Journal of Cheminformatics.
[44] A. Helwak,et al. High Guanine and Cytosine Content Increases mRNA Levels in Mammalian Cells , 2006, PLoS biology.
[45] Duen Horng Chau,et al. Building Big Data Processing and Visualization Pipeline through Apache Zeppelin , 2018, PEARC.
[46] Ola Spjuth,et al. Container-based bioinformatics with Pachyderm , 2018, bioRxiv.
[47] Rolf Apweiler,et al. The European Bioinformatics Institute in 2018: tools, infrastructure and training , 2018, Nucleic Acids Res..
[48] Long Zheng,et al. More convenient more overhead: the performance evaluation of Hadoop streaming , 2011, RACS.
[49] Richard Durbin,et al. Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .
[50] Zhao Zhang,et al. Rethinking Data-Intensive Science Using Scalable Analytics Systems , 2015, SIGMOD Conference.
[51] Gonçalo R. Abecasis,et al. The variant call format and VCFtools , 2011, Bioinform..