Annotating the Behavior of Scientific Modules Using Data Examples: A Practical Approach

A major issue that arises when designing scientific experiments(i.e., workflows) is that of identifying the modules (which are of-ten “black boxes”), that are suitable for performing the steps of theexperiment. To assist scientists in the task of identifying suitablemodules, semantic annotations have been proposed and used to de-scribe scientific modules. Different facets of the module can be de-scribed using semantic annotations. Our experience with scientistsfrom modern sciences such as bioinformatics, biodiversity and as-tronomy, however, suggests that most of semantic annotations thatare available are confined to the description of the domain of inputand output parameters of modules. Annotations specifying the be-havior of the modules, as to the tasks they play, are rarely specified.To address this issue, we argue in this paper that data examples arean intuitive and effective means for understanding the behavior ofscientific modules. We present a heuristic for automatically gener-ating data examples that annotate scientific modules without rely-ing on the existence of the module specifications, and show throughan empirical evaluation that uses real-world scientific modules theeffectiveness of the heuristic proposed.The data examples generated can be utilized in a range of scientificmodule management operations. To demonstrate this, we presentthe results of two real-world exercises that show that: (i) Data ex-amples are an intuitive means for human users to understand thebehavior of scientific modules, and that (ii) data examples are aneffective ingredient for matching scientific modules.

[1]  Ron Patton Software Testing (2nd Edition) , 2005 .

[2]  Amit P. Sheth,et al.  A Faceted Classification Based Approach to Search and Rank Web APIs , 2008, 2008 IEEE International Conference on Web Services.

[3]  Carole A. Goble,et al.  Why workflows break — Understanding and combating decay in Taverna workflows , 2012, 2012 IEEE 8th International Conference on E-Science.

[4]  Ian Horrocks,et al.  Deciding Semantic Matching of Stateless Services , 2006, AAAI.

[5]  Oscar Corcho,et al.  Common Motifs in Scientic Workows: An Empirical Analysis , 2013 .

[6]  Carole A. Goble,et al.  Functional Units: Abstractions for Web Service Annotations , 2010, 2010 6th World Congress on Services.

[7]  Ulf Leser,et al.  Adapters, shims, and glue - service interoperability for in silico experiments , 2006, Bioinform..

[8]  Óscar Corcho,et al.  A workflow PROV-corpus based on taverna and wings , 2013, EDBT '13.

[9]  Cláudio T. Silva,et al.  Making Computations and Publications Reproducible with VisTrails , 2012, Computing in Science & Engineering.

[10]  Laura M. Haas,et al.  Data-driven understanding and refinement of schema mappings , 2001, SIGMOD '01.

[11]  Sam Ruby,et al.  RESTful Web Services , 2007 .

[12]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[13]  Alexandra Poulovassilis,et al.  Proteome Data Integration: Characteristics and Challenges , 2005 .

[14]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[15]  Norman W. Paton,et al.  Verification of Semantic Web Service Annotations Using Ontology-Based Partitioning , 2014, IEEE Transactions on Services Computing.

[16]  Dieter Fensel,et al.  Semantic Web Services Grounding , 2006, Advanced Int'l Conference on Telecommunications and Int'l Conference on Internet and Web Applications and Services (AICT-ICIW'06).

[17]  Arvinder Kaur,et al.  Interoperability issues in Web Services , 2012, CCSEIT '12.

[18]  Frank Leymann,et al.  Web Services Platform Architecture: SOAP, WSDL, WS-Policy, WS-Addressing, WS-BPEL, WS-Reliable Messaging, and More , 2005 .

[19]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[20]  Carole A. Goble,et al.  Fostering Scientific Workflow Preservation through Discovery of Substitute Services , 2011, 2011 IEEE Seventh International Conference on eScience.

[21]  Hideaki Takeda,et al.  OWL-Full Reasoning from an Object Oriented Perspective , 2006, ASWC.

[22]  M. S. Rajasree,et al.  A framework for the description, discovery and composition of RESTful semantic web services , 2012, CCSEIT '12.

[23]  V. Rich Personal communication , 1989, Nature.

[24]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[25]  Steffen Staab,et al.  Semantic Service Provisioning , 2008 .

[26]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[27]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[28]  Paul T. Groth,et al.  Wings: Intelligent Workflow-Based Design of Computational Experiments , 2011, IEEE Intelligent Systems.

[29]  Bogdan Korel,et al.  Automated Software Test Data Generation , 1990, IEEE Trans. Software Eng..

[30]  Takahiro Kawamura,et al.  Semantic Matching of Web Services Capabilities , 2002, SEMWEB.

[31]  Amit P. Sheth,et al.  METEOR-S Web Service Annotation Framework with Machine Learning Classification , 2004, SWSWPC.

[32]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[33]  Carole Goble,et al.  Curating Scientific Web Services and Workflows , 2008 .

[34]  Carsten Binnig,et al.  Reverse Query Processing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[35]  Carole Goble,et al.  BioCatalogue: A Curated Web Service Registry For The Life Science Community , 2009 .

[36]  Carole A. Goble,et al.  Common motifs in scientific workflows: An empirical analysis , 2012, 2012 IEEE 8th International Conference on E-Science.

[37]  Carole A. Goble,et al.  Data curation + process curation=data integration + science , 2008, Briefings Bioinform..

[38]  Andreas Abecker,et al.  Semantic Web Services: Concepts, Technologies, and Applications , 2010 .

[39]  Christopher Olston,et al.  Generating example data for dataflow programs , 2009, SIGMOD Conference.