An Integrated Framework for Parameter-based Optimization of Scientific Workflows.

Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the output, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple dimensions of the parameter space. Using two real-world applications in the spatial data analysis domain, we present an experimental evaluation of the proposed framework.

[1]  Olcay Sertel,et al.  Computer-assisted grading of neuroblastic differentiation. , 2008, Archives of pathology & laboratory medicine.

[2]  V.S. Kumar,et al.  Large Image Correction and Warping in a Cluster Environment , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[3]  Schahram Dustdar,et al.  Performance metrics and ontologies for Grid workflows , 2007, Future Gener. Comput. Syst..

[4]  Yolanda Gil,et al.  Pegasus: Mapping Scientific Workflows onto the Grid , 2004, European Across Grids Conference.

[5]  Vittorio Cortellessa,et al.  Automated Selection of Software Components Based on Cost/Reliability Tradeoff , 2006, EWSA.

[6]  Carlos Juiz,et al.  Performance-related ontologies and semantic web applications for on-line performance assessment of intelligent systems , 2006, Sci. Comput. Program..

[7]  I-Hsin Chung,et al.  A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[8]  Ivona Brandic,et al.  Specification, planning, and execution of QoS‐aware Grid workflows within the Amadeus environment , 2008, Concurr. Comput. Pract. Exp..

[9]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[10]  Chun Chen,et al.  Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.

[11]  Fangzhe Chang,et al.  Automatic configuration and run-time adaptation of distributed applications , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[12]  Jun Kong,et al.  Analysis and Semantic Querying in Large Biomedical Image Datasets , 2008, Computer.

[13]  Bertram Ludäscher,et al.  Scientific workflow management and the Kepler system: Research Articles , 2006 .

[14]  I-Ling Yen,et al.  A rule-based component customization technique for QoS properties , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[15]  Raymond Turner,et al.  Specification , 2011, Minds and Machines.

[16]  Gagan Agrawal,et al.  Cost and accuracy sensitive dynamic workflow composition over grid environments , 2008, 2008 9th IEEE/ACM International Conference on Grid Computing.

[17]  I-Hsin Chung,et al.  Using Information from Prior Runs to Improve Automated Tuning Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[18]  Johan Montagnat,et al.  Efficient services composition for grid-enabled data-intensive applications , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[19]  Yolanda Gil,et al.  Wings for Pegasus: Creating Large-Scale Scientific Applications Using Semantic Representations of Computational Workflows , 2007, AAAI.

[20]  David E. Bernholdt,et al.  Computational Quality of Service for Scientific Components , 2004, CBSE.

[21]  Tony Pan,et al.  Large-Scale Biomedical Image Analysis in Grid Environments , 2008, IEEE Transactions on Information Technology in Biomedicine.

[22]  Joel H. Saltz,et al.  Distributed processing of very large datasets with DataCutter , 2001, Parallel Comput..

[23]  M E Martone,et al.  Automated microscopy system for mosaic acquisition and processing , 2006, Journal of microscopy.

[24]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[25]  Kim L. Boyer,et al.  Computer-Aided Grading of Neuroblastic Differentiation: Multi-Resolution and Multi-Classifier Approach , 2007, 2007 IEEE International Conference on Image Processing.

[26]  Miron Livny,et al.  Distributed computing in practice: the Condor experience: Research Articles , 2005 .