Customizing Scientific Data Analytic Platforms via SaaS Approach

At the era of data driven science discovery, it is essential to provide customizable scientific data analytic platforms for researchers to conduct their personalized data intensive analysis. Science Gateway has been a viable solution to enabling scientists to run scientific simulations, data analysis, and visualization through their web browsers. But most science gateway frameworks are designed for integrating commonly used software tools and datasets in a specific science domain, thus requiring significant effort to implement the essential variability in lab-specific data processing workflows. In this paper we introduce a multitenancy architecture (MTA) based customization framework that can greatly accelerate the customization cycle of science gateway systems. Each tenant has his own workspace that assembles the software stack and tools to meet the software requirements of his specific data analytics tasks. Through this framework, developers can import their domain-specific analysis pipeline scripts and mashup relevant templates including GUI templates, tool recipes and workspace templates to generate both workspace and web interface for running these application workflows and visualizing the output from workflow executions without writing extra wrapping code.

[1]  Robert Schmieder,et al.  SEQanswers: an open access community for collaboratively decoding genomes , 2012, Bioinform..

[2]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[3]  Liang Luo,et al.  GreenPipe: A Hadoop Based Workflow System on Energy-efficient Clouds , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  Enis Afgan,et al.  CloudMan as a platform for tool, data, and analysis distribution , 2012, BMC Bioinformatics.

[5]  Michael E. Papka,et al.  Accelerating science gateway development with Web 2.0 and Swift , 2010 .

[6]  Matthias Marschall Chef Infrastructure Automation Cookbook , 2013 .

[7]  Nancy Wilkins-Diehr,et al.  TeraGrid Science Gateways and Their Impact on Science , 2008, Computer.

[8]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[9]  Yi Huang,et al.  Building web services for scientific grid applications , 2006, IBM J. Res. Dev..

[10]  Swetansu Pattnaik,et al.  Customisation of the Exome Data Analysis Pipeline Using a Combinatorial Approach , 2012, PloS one.

[11]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[12]  Dan M. Bolser,et al.  The SEQanswers wiki: a wiki database of tools for high-throughput sequencing analysis , 2011, Nucleic Acids Res..

[13]  Zlatko Trajanoski,et al.  SIMPLEX: Cloud-Enabled Pipeline for the Comprehensive Analysis of Exome Sequencing Data , 2012, PloS one.

[14]  Wei-Tek Tsai,et al.  EasySaaS: A SaaS development framework , 2011, 2011 IEEE International Conference on Service-Oriented Computing and Applications (SOCA).

[15]  Spencer Krum,et al.  Developing and Deploying Puppet , 2013 .