Provenance-based validation of e-science experiments

E-science experiments typically involve many distributed services maintained by different organisations. After an experiment has been executed, it is useful for a scientist to verify that the execution was performed correctly or is compatible with some existing experimental criteria or standards, not necessarily anticipated prior to execution. Scientists may also want to review and verify experiments performed by their colleagues. There are no existing frameworks for validating such experiments in today's e-science systems. Users therefore have to rely on error checking performed by the services, or adopt other ad hoc methods. This paper introduces a platform-independent framework for validating workflow executions. The validation relies on reasoning over the documented provenance of experiment results and semantic descriptions of services advertised in a registry. This validation process ensures experiments are performed correctly, and thus results generated are meaningful. The framework is tested in a bioinformatics application that performs protein compressibility analysis.

[1]  Manfred Schmidt-Schauß,et al.  Subsumption in KL-ONE is Undecidable , 1989, KR.

[2]  Carole A. Goble,et al.  A Suite of Daml+Oil Ontologies to Describe Bioinformatics Web Services and Data , 2003, Int. J. Cooperative Inf. Syst..

[3]  James Frew,et al.  Lineage retrieval for scientific data processing: a survey , 2005, CSUR.

[4]  En-Hui Yang,et al.  Estimating DNA sequence entropy , 2000, SODA '00.

[5]  Ian H. Witten,et al.  Protein is incompressible , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[6]  Michael Luck,et al.  Towards a protocol for the attachment of metadata to grid service descriptions and its use in semantic discovery , 2004, Sci. Program..

[7]  Jerry R. Hobbs,et al.  DAML-S: Semantic Markup for Web Services , 2001, SWWS.

[8]  Will Tracz,et al.  Proceedings of the 24th International Conference on Software Engineering, ICSE 2002, 19-25 May 2002, Orlando, Florida, USA , 2002, ICSE.

[9]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[10]  Rik Eshuis,et al.  Verification support for workflow design with UML activity graphs , 2002, ICSE '02.

[11]  G. Sampath,et al.  A block coding method that leads to significantly lower entropy values for the proteins and coding sections of Haemophilus influenzae , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[12]  Chris Walton Model Checking Multi-Agent Web Services , 2004 .

[13]  H. Lan,et al.  SWRL : A semantic Web rule language combining OWL and ruleML , 2004 .

[14]  Carole A. Goble,et al.  Automating experiments using semantic data in a bioinformatics grid , 2004, IEEE Intelligent Systems.

[15]  Simon Miles,et al.  Proceedings of the UK e-Science All Hands Meeting 2005 , 2005 .

[16]  David C. Fallside,et al.  Xml schema part 0: primer , 2000 .

[17]  Yolanda Gil,et al.  Planning for workflow construction and maintenance on the Grid , 2003 .

[18]  Manfred Schmidt-Schaubß,et al.  Subsumption in KL-ONE is undecidable , 1989, KR 1989.

[19]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[20]  Yolanda Gil,et al.  Artificial intelligence and grids: workflow planning and beyond , 2004, IEEE Intelligent Systems.

[21]  Paul T. Groth,et al.  A provenance-aware weighted fault tolerance scheme for service-based applications , 2005, Eighth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'05).

[22]  Paul T. Groth,et al.  Recording and using provenance in a protein compressibility experiment , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[23]  R. Wieringa,et al.  Verification support for workflow design with UML activity graphs , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[24]  Paul J. Walmsley,et al.  XML Schema Part 0: Primer Second Edition , 2004 .

[25]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[26]  Jeffrey M. Bradshaw,et al.  KAoS: A Policy and Domain Services Framework for Grid Computing and Semantic Web Services , 2004, iTrust.

[27]  Paul T. Groth,et al.  The Requirements of Using Provenance in e-Science Experiments , 2007, Journal of Grid Computing.

[28]  Jeffrey M. Bradshaw,et al.  Behavioural specification of grid services with the KAoS policy language , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[29]  PlaleBeth,et al.  A survey of data provenance in e-science , 2005 .

[30]  Paul T. Groth,et al.  The requirements of recording and using provenance in e- Science experiments , 2005 .

[31]  Luciano Baresi,et al.  Workflow Partitioning in Mobile Information Systems , 2004, MOBIS.