Extracting Semantics from Legacy Scientific Workflows

In this paper we present a method that uses the Workflow Instrumentation for structure Extraction (WISE) combined with the SemanticMap methods to process ad-hoc legacy workflows written in Python and produce a mapping of the workflow structural skeleton to a domain ontology. The method provides the foundation for searching through scientific workflows with conceptual queries.

[1]  Young-Koo Lee,et al.  Lossless graph summarization using dense subgraphs discovery , 2015, IMCOM.

[2]  Zoé Lacroix,et al.  Semantic Map of Services for Structural Bioinformatics , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[3]  Geert Poels,et al.  Merging event logs for process mining: A rule based merging method and rule suggestion algorithm , 2014, Expert Syst. Appl..

[4]  Minsuk Kahng,et al.  Scalable graph exploration and visualization: Sensemaking challenges and opportunities , 2015, 2015 International Conference on Big Data and Smart Computing (BIGCOMP).

[5]  Sükrü Tüzmen,et al.  Reasoning on Scientific Workflows , 2009, 2009 Congress on Services - I.

[6]  Maliha Aziz,et al.  Resource descriptions, ontology, and resource discovery , 2010, Int. J. Metadata Semant. Ontologies.

[7]  Marta Mattoso,et al.  Provenance traces from Chiron parallel workflow engine , 2013, EDBT '13.

[8]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[9]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[10]  F. Gargouri,et al.  A comparative study of Workflow Mining systems , 2012, 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT).

[11]  Ulf Leser,et al.  Similarity Search for Scientific Workflows , 2014, Proc. VLDB Endow..

[12]  Baowen Xu,et al.  Static Slicing for Python First-Class Objects , 2013, 2013 13th International Conference on Quality Software.

[13]  Frederico T. Fonseca,et al.  Learning The Differences Between Ontologies and Conceptual Schemas Through Ontology-Driven Information Systems , 2007, J. Assoc. Inf. Syst..

[14]  Yuming Zhou,et al.  Dynamic Slicing of Python Programs , 2014, 2014 IEEE 38th Annual Computer Software and Applications Conference.

[15]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[16]  Juliana Freire,et al.  noWorkflow: Capturing and Analyzing Provenance of Scripts , 2014, IPAW.

[17]  Melinda T. Gervasio,et al.  What were you thinking?: filling in missing dataflow through inference in learning from demonstration , 2009, IUI.

[18]  Mathias Weske,et al.  Scientific Workflows: Business as Usual? , 2009, BPM.

[19]  Zoé Lacroix,et al.  ProtocolDB: Storing Scientific Protocols with a Domain Ontology , 2007, WISE Workshops.

[20]  Philip J. Guo,et al.  Using automatic persistent memoization to facilitate data analysis scripting , 2011, ISSTA '11.

[21]  Mounia Lalmas,et al.  Summarisation of the logical structure of XML documents , 2012, Inf. Process. Manag..

[22]  Zoé Lacroix,et al.  Semantic Map for Structural Bioinformatics: Enhanced Service Discovery Based on High Level Concept Ontology , 2010, RED.

[23]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[24]  Rida A. Bazzi,et al.  Instrumentation and Trace Analysis for Ad-Hoc Python Workflows in Cloud Environments , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[25]  Andreas Wombacher,et al.  ProvenanceCurious: a tool to infer data provenance from scripts , 2013, EDBT '13.

[26]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[27]  Susan B. Davidson,et al.  PDiffView: Viewing the Difference in Provenance of Workflow Results , 2009, Proc. VLDB Endow..

[28]  Malcolm P. Atkinson,et al.  dispel4py: A Python framework for data-intensive scientific computing , 2014, 2014 International Workshop on Data Intensive Scalable Computing Systems.

[29]  Zoé Lacroix,et al.  Storing Scientific Workflows in a Database , 2009, Proc. VLDB Endow..

[30]  Zoé Lacroix,et al.  Managing and Documenting Legacy Scientific Workflows , 2015, J. Integr. Bioinform..