A Data Model for Analyzing User Collaborations in Workflow-Driven e-Science

Scientific discoveries are often the result of methodical execution of many interrelated scientific workflows, where workflows and datasets published by one set of users can be used by other users to perform subsequent analyses, leading to implicit or explicit collaboration. In this paper, we describe a data model for "collaborative provenance" that extends common workflow provenance models by introducing attributes for characterizing the nature of user collaborations as well as their strength (or weight). In addition, through the implementation of a real-world bioinformatics use case scenario and an associated collaborative provenance database, we demonstrate and evaluate the effectiveness of our model in understanding and analyzing user collaboration in scientific discoveries driven by scientific workflows. Key Words: Collaborative e-Science, user collaborations, scientific workflow systems, provenance, workflow runs, data publication, querying.

[1]  Bertram Ludäscher,et al.  Provenance browser: Displaying and querying scientific workflow provenance graphs , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[2]  I. Altintas Collaborative provenance for workflow-driven science and engineering , 2011 .

[3]  Carole A. Goble,et al.  myGrid and UTOPIA: An Integrated Approach to Enacting and Visualising in Silico Experiments in the Life Sciences , 2007, DILS.

[4]  Peter M. A. Sloot,et al.  Understanding Collaborative Studies through Interoperable Workflow Provenance , 2010, IPAW.

[5]  Marta Mattoso,et al.  Towards a Taxonomy of Provenance in Scientific Workflow Management Systems , 2009, 2009 Congress on Services - I.

[6]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[7]  Jing Chen,et al.  Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource , 2010, Nucleic Acids Res..

[8]  Bertram Ludäscher,et al.  Efficient provenance storage over nested data collections , 2009, EDBT '09.

[9]  M. Laclavik,et al.  Expanding the Knowledge Economy : Issues , Applications , Case Studies , 2007 .

[10]  Shiyong Lu,et al.  RDFProv: A relational RDF store for querying and managing scientific workflow provenance , 2010, Data Knowl. Eng..

[11]  Carole A. Goble,et al.  Guest editors' introduction to the special section on scientific workflows , 2005, SGMD.

[12]  Marian Bubak,et al.  Collaborative Virtual Laboratory for e-Health , 2007 .

[13]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[14]  Bertram Ludäscher,et al.  Project Histories: Managing Data Provenance Across Collection-Oriented Scientific Workflow Runs , 2007, DILS.

[15]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[16]  Cees T. A. M. de Laat,et al.  VLE-WFBus: A Scientific Workflow Bus for Multi e-Science Domains , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[17]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[18]  Bertram Ludäscher,et al.  Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs , 2009, SSDBM.

[19]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[20]  Luc Moreau,et al.  Provenance and Annotation of Data, International Provenance and Annotation Workshop, IPAW 2006, Chicago, IL, USA, May 3-5, 2006, Revised Selected Papers , 2006, IPAW.

[21]  Marian Bubak,et al.  Invocation of operations from script-based Grid applications , 2010, Future Gener. Comput. Syst..

[22]  Francine Berman,et al.  Got data?: a guide to data preservation in the information age , 2008, CACM.

[23]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[24]  Bertram Ludäscher,et al.  Techniques for efficiently querying scientific workflow provenance graphs , 2010, EDBT '10.

[25]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[26]  Chaitanya K. Baru,et al.  GEONGrid portal: design and implementations , 2007, Concurr. Comput. Pract. Exp..

[27]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[28]  P. Coveney,et al.  HIV decision support: from molecule to man , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[29]  Carole A. Goble,et al.  myExperiment: a repository and social network for the sharing of bioinformatics workflows , 2010, Nucleic Acids Res..

[30]  Jing Chen,et al.  CAMERA 2.0: A Data-centric Metagenomics Community Infrastructure Driven by Scientific Workflows , 2010, 2010 6th World Congress on Services.

[31]  John Scott Social Network Analysis , 1988 .

[32]  Jing Chen,et al.  Extending the Data Model for Data-Centric Metagenomics Analysis Using Scientific Workflows in CAMERA , 2010, 2010 Sixth IEEE International Conference on e-Science Workshops.