Optimising Scientific Workflow Execution Using Desktops, Clusters and Clouds

Scientific Gateways are one of the most important tools for designing and running experiments. Despite the possibility of local operation they are mainly available via online interfaces based on cloud computing instances. Our studies show that cloud machines may not be the best solution to every situation and that the advantages of heterogeneous cluster machines should be considered in scheduling experiments, saving both financial and computational resources, avoiding network delays and managing the infrastructure as needed. We run a variety of scenarios of bioinformatic experiments in three different sets of machines, a workstation, a cluster and cloud platform. Then, using Support Vector Machines (SVM), a nonlinear regression technique over the results, we can define the best machine configuration in terms of processing time according to the input parameters. The results show an approximation with a small error, that can define with good confidence the proper infrastructure to host a instance of the framework Galaxy, used as study case. With a system based on diverse environments, the researchers can properly schedule each set of experiments.

[1]  Dominique Estival,et al.  Supporting accessibility and reproducibility in language research in the Alveo virtual laboratory , 2017, Comput. Speech Lang..

[2]  Andrew M. Lynn,et al.  MetaNET - a web-accessible interactive platform for biological metabolic network analysis , 2014, BMC Systems Biology.

[3]  Daniel Jacob,et al.  Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics , 2014, Bioinform..

[4]  Péter Kacsuk,et al.  Building Science Gateways by Utilizing the Generic WS-Pgrade/gUSE Workflow System , 2013, Comput. Sci..

[5]  Youri Hoogstrate,et al.  iReport: a generalised Galaxy solution for integrated experimental reporting , 2014, GigaScience.

[6]  Alexandre C. B. Delbem,et al.  Multi-Objective Evolutionary Algorithm NSGA-II for Protein Structure Prediction using Structural and Energetic Properties , 2014, Int. J. Nat. Comput. Res..

[7]  Andrea Pinna,et al.  An automated infrastructure to support high-throughput bioinformatics , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).

[8]  Carole A. Goble,et al.  Structuring research methods and data with the research object model: genomics workflows as a case study , 2013, Journal of Biomedical Semantics.

[9]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[10]  Borja Sotomayor,et al.  Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses , 2014, J. Biomed. Informatics.

[11]  James E. Johnson,et al.  Flexible and Accessible Workflows for Improved Proteogenomic Analysis Using the Galaxy Framework , 2014, Journal of proteome research.

[12]  William H. Thiel,et al.  Galaxy Workflows for Web-based Bioinformatics Analysis of Aptamer High-throughput Sequencing Data , 2016, Molecular therapy. Nucleic acids.

[13]  Andrew Lonie,et al.  Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud , 2015, PloS one.

[14]  Hsuan-Tien Lin,et al.  Learning From Data , 2012 .

[15]  Alex Rodriguez,et al.  The Globus Galaxies platform: delivering science gateways as a service , 2015, Concurr. Comput. Pract. Exp..

[16]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[17]  Leighton Pritchard,et al.  Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology , 2013, PeerJ.

[18]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[19]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[20]  Ian T. Foster,et al.  A Cloud-Based Image Analysis Gateway for Traumatic Brain Injury Research , 2014, 2014 9th Gateway Computing Environments Workshop.

[21]  Conrad Bessant,et al.  Galaxy Integrated Omics: Web-based Standards-Compliant Workflows for Proteomics Informed by Transcriptomics* , 2015, Molecular & Cellular Proteomics.

[22]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Qutaibah M. Malluhi,et al.  A resource provisioning framework for bioinformatics applications in multi-cloud environments , 2018, Future Gener. Comput. Syst..

[25]  Youri Hoogstrate,et al.  Integration of EGA secure data access into Galaxy , 2016, F1000Research.