A Semiotics Framework for Analyzing Data Provenance Research

Data provenance is the background knowledge that enables a piece of data to be interpreted and used correctly within context. The importance of tracking provenance is widely recognized, as witnessed by significant research in various areas including e-science, homeland security, and data warehousing and business intelligence. In order to further advance the research on data provenance, however, one must first understand the research that has been conducted to date and identify specific topics that merit further investigation. In this work, we develop a framework based on semiotics theory to assist in analyzing and comparing existing provenance research at the conceptual level. We provide a detailed review of data provenance research and compare and contrast the research based on a semiotics framework. We conclude with an identification of challenges that will drive future research in this field.

[1]  Wang Chiew Tan,et al.  An annotation management system for relational databases , 2004, The VLDB Journal.

[2]  James D. Myers,et al.  Re-integrating the research record , 2003, Comput. Sci. Eng..

[3]  Luc Moreau,et al.  Recording and Reasoning over Data Provenance in Web and Grid Services , 2003, OTM.

[4]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[5]  Paul T. Groth,et al.  Provenance-based validation of e-science experiments , 2005, J. Web Semant..

[6]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[7]  Ferdinand de Saussure Course in General Linguistics , 1916 .

[8]  Calton Pu,et al.  Using Domain Ontologies to Help Track Data Provenance , 2003, SBBD.

[9]  Wang Chiew Tan,et al.  Research Problems in Data Provenance , 2004, IEEE Data Eng. Bull..

[10]  Stuart E. Madnick,et al.  Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage , 2007 .

[11]  Paul T. Groth,et al.  Recording and using provenance in a protein compressibility experiment , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[12]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[13]  Jennifer Widom,et al.  Lineage tracing for general data warehouse transformations , 2003, The VLDB Journal.

[14]  C. Morris,et al.  Signs, Language and Behavior , 1947 .

[15]  D. Lanter Design of a Lineage-Based Meta-Data Base for GIS , 1991 .

[16]  Robert Stevens,et al.  Annotating, Linking and Browsing Provenance Logs for {e-Science} , 2003 .

[17]  Stuart E. Madnick,et al.  A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective , 1990, VLDB.

[18]  Veda C. Storey,et al.  A semiotics framework for information systems classification and development , 1999, Decis. Support Syst..

[19]  Donald P. Ballou,et al.  Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems , 1985 .

[20]  James Cheney,et al.  A Provenance Model for Manually Curated Data , 2006, IPAW.

[21]  Michael Luck,et al.  A Protocol for Recording Provenance in Service-Oriented Grids , 2004, OPODIS.

[22]  Yogesh L. Simmhan,et al.  A survey of data provenance techniques , 2005 .

[23]  Clifford A. Lynch,et al.  When documents deceive: Trust and provenance as new factors for information retrieval in a tangled web , 2001, J. Assoc. Inf. Sci. Technol..

[24]  Jaejin Lee,et al.  A Practical Improvement to the Partial Redundancy Elimination in SSA Form , 2008, J. Comput. Sci. Eng..

[25]  Paul T. Groth,et al.  Security Issues in a SOA-Based Provenance System , 2006, IPAW.

[26]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[27]  M. S. Fox *,et al.  Knowledge provenance in enterprise information , 2005 .

[28]  R. Stamper The Semiotic Framework for Information Systems Research , 1990 .

[29]  Rajendra Bose A conceptual framework for composing and managing scientific data lineage , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[30]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[31]  Peter Buneman,et al.  Provenance in databases , 2009, SIGMOD '07.

[32]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[33]  Yun Peng,et al.  On Homeland Security and the Semantic Web: A Provenance and Trust Aware Inference Framework , 2005, AAAI Spring Symposium: AI Technologies for Homeland Security.

[34]  Paul T. Groth,et al.  The provenance of electronic data , 2008, CACM.

[35]  David P. Lanter,et al.  User-Centered Graphical User Interface Design for GIS (91-6) , 1991 .

[36]  Richard Y. Wang,et al.  Modeling Information Manufacturing Systems to Determine Information Product Quality Management Scien , 1998 .

[37]  Kaizar Amin,et al.  Metadata in the Collaboratory for Multi-Scale Chemical Science , 2003, Dublin Core Conference.

[38]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[39]  Uri Braun,et al.  A Security Model for Provenance , 2006 .

[40]  Genshe Chen,et al.  Pedigree Information for Enhanced Situation and Threat Assessment , 2006, 2006 9th International Conference on Information Fusion.

[41]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[42]  James Frew,et al.  Earth System Science Workbench: a data management infrastructure for earth science products , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[43]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[44]  Yogesh L. Simmhan,et al.  A Framework for Collecting Provenance in Data-Centric Scientific Workflows , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[45]  Michael Stonebraker,et al.  Supporting fine-grained data lineage in a database visualization environment , 1997, Proceedings 13th International Conference on Data Engineering.

[46]  James Frew,et al.  Lineage retrieval for scientific data processing: a survey , 2005, CSUR.

[47]  Jorge Luis Romeu Data Quality and Pedigree , 1999 .

[48]  R. Stamper The semiotic framework for informations systems research , 1991 .

[49]  James Frew,et al.  Composing lineage metadata with XML for custom satellite-derived data products , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[50]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[51]  Luc Moreau,et al.  Provenance of e-Science Experiments - Experience from Bioinformatics , 2003 .

[52]  Seung-Hee Han,et al.  An Empirical Evaluation of Test Data Generation Techniques , 2008, J. Comput. Sci. Eng..

[53]  Margo I. Seltzer,et al.  Issues in Automatic Provenance Collection , 2006, IPAW.

[54]  Yong Zhao,et al.  Applying the Virtual Data Provenance Model , 2006, IPAW.

[55]  Karen Schuchardt,et al.  Multi-scale Science: Supporting Emerging Practice with Semantically Derived Provenance , 2003 .

[56]  Gustavo Alonso,et al.  Geo-Opera: Workflow Concepts for Spatial Processes , 1997, SSD.