Cloud4SNP: Distributed Analysis of SNP Microarray Data on the Cloud

Pharmacogenomics studies the impact of genetic variation of patients on drug responses and searches for correlations between gene expression or Single Nucleotide Polymorphisms (SNPs) of patient's genome and the toxicity or efficacy of a drug. SNPs data, produced by microarray platforms, need to be preprocessed and analyzed in order to find correlation between the presence/absence of SNPs and the toxicity or efficacy of a drug. Due to the large number of samples and the high resolution of instruments, the data to be analyzed can be very huge, requiring high performance computing. The paper presents the design and experimentation of Cloud4SNP, a novel Cloud-based bioinformatics tool for the parallel preprocessing and statistical analysis of pharmacogenomics SNP microarray data. Experimental evaluation shows good speed-up and scalability. Moreover, the availability on the Cloud platform allows to face in an elastic way the requirements of small as well as very large pharmacogenomics studies.

[1]  Christopher Phillips SNP databases. , 2009, Methods in molecular biology.

[2]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[3]  Sridar V. Chittur,et al.  Microarray Methods for Drug Discovery , 2010, Methods in Molecular Biology.

[4]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[5]  Domenico Talia,et al.  A Cloud Framework for Big Data Analytics Workflows on Azure , 2012, High Performance Computing Workshop.

[6]  Domenico Talia,et al.  How distributed data mining tasks can thrive as knowledge services , 2010, Commun. ACM.

[7]  A. Komar Single Nucleotide Polymorphisms , 2009, Methods in Molecular Biology™.

[8]  M. Shapero,et al.  DMET microarray technology for pharmacogenomics-based personalized medicine. , 2010, Methods in molecular biology.

[9]  Ulrich Mansmann,et al.  affyPara—a Bioconductor Package for Parallelized Preprocessing Algorithms of Affymetrix Microarray Data , 2009, Bioinformatics and biology insights.

[10]  Domenico Talia,et al.  Service-Oriented Distributed Knowledge Discovery , 2012 .

[11]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[12]  Domenico Talia,et al.  Using Clouds for Scalable Knowledge Discovery Applications , 2012, Euro-Par Workshops.

[13]  Domenico Talia,et al.  A Cloud Framework for Parameter Sweeping Data Mining Applications , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[14]  Eugenio Cesario,et al.  Programming knowledge discovery workflows in service‐oriented distributed systems , 2013, Concurr. Comput. Pract. Exp..

[15]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[16]  John Darlington,et al.  EMAAS: An extensible grid-based Rich Internet Application for microarray data analysis and management , 2008, BMC Bioinformatics.

[17]  Mario Cannataro,et al.  A peroxisome proliferator-activated receptor gamma (PPARG) polymorphism is associated with zoledronic acid-related osteonecrosis of the jaw in multiple myeloma patients: analysis by DMET microarray profiling , 2011, British journal of haematology.

[18]  Yolanda Gil,et al.  Pegasus: Mapping Scientific Workflows onto the Grid , 2004, European Across Grids Conference.

[19]  Mario Cannataro,et al.  DMET-Analyzer: automatic analysis of Affymetrix DMET Data , 2012, BMC Bioinformatics.

[20]  Bertram Ludäscher,et al.  Scientific workflow management and the Kepler system: Research Articles , 2006 .

[21]  Mario Cannataro,et al.  Single nucleotide polymorphisms of ABCC5 and ABCG1 transporter genes correlate to irinotecan-associated gastrointestinal toxicity in colorectal cancer patients: A DMET microarray profiling study , 2011, Cancer biology & therapy.

[22]  Mario Cannataro,et al.  μ-CS: An extension of the TM4 platform to manage Affymetrix binary data , 2010, BMC Bioinformatics.