Automatic Metadata Annotation through Reconstructing Provenance

Annotating datasets with metadata is an important part of organizing and curating data. However, it is a time consuming process and often not done in a rigorous fashion. In this paper, we propose a new approach to annotating datasets through the use of reconstructed provenance. A detailed survey of the related work in this area is given. Additionally, we provide an overview of our approach for both reconstructing provenance and using that provenance to automatically annotate datasets with metadata. This approach leverages existing work in AI planning and change detection algorithms.

[1]  Carole A. Goble,et al.  Fostering Scientific Workflow Preservation through Discovery of Substitute Services , 2011, 2011 IEEE Seventh International Conference on eScience.

[2]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  Carrie Gates,et al.  One of These Records Is Not Like the Others , 2011, TaPP.

[4]  James Frew,et al.  Automatic capture and reconstruction of computational provenance , 2008 .

[5]  Dimitris Plexousakis,et al.  Information Systems Laboratory Automated Web Service Composition : State of the Art and Research Challenges , 2010 .

[6]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[7]  Sanjeev Khanna,et al.  Differencing Provenance in Scientific Workflows , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[8]  Michel C. A. Klein,et al.  Tracking Changes During Ontology Evolution , 2004, SEMWEB.

[9]  Xuelong Li,et al.  A survey of graph edit distance , 2010, Pattern Analysis and Applications.

[10]  Luc Moreau,et al.  The Foundations for Provenance on the Web , 2010, Found. Trends Web Sci..

[11]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[12]  Wineke A. M. van Lent,et al.  Similarity of business process models : metrics and evaluation , 2009 .

[13]  Andreas Wombacher,et al.  Inferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy , 2011, DEXA.

[14]  Michael Stonebraker,et al.  Supporting fine-grained data lineage in a database visualization environment , 1997, Proceedings 13th International Conference on Data Engineering.

[15]  Margo Seltzer,et al.  PASSing the provenance challenge , 2008 .

[16]  Prasant Mohapatra,et al.  PRONET: Network trust assessment based on incomplete provenance , 2011, 2011 - MILCOM 2011 Military Communications Conference.

[17]  Viktor K. Prasanna,et al.  Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[18]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[19]  Vinay Deolalikar,et al.  Provenance as Data Mining: Combining File System Metadata with Content Analysis , 2009, Workshop on the Theory and Practice of Provenance.

[20]  Dimitris Plexousakis,et al.  Foundation for Research & Technology - Hellas Institute of Computer Science Information Systems Laboratory Automated Web Service Composition: State of the Art and Research Challenges , 2010 .

[21]  Hector Garcia-Molina,et al.  Meaningful change detection in structured data , 1997, SIGMOD '97.

[22]  Huan Liu,et al.  Information Provenance in Social Media , 2011, SBP.

[23]  Xiaomeng Su,et al.  A Survey of Automated Web Service Composition Methods , 2004, SWSWPC.

[24]  Boudewijn F. van Dongen,et al.  Workflow mining: A survey of issues and approaches , 2003, Data Knowl. Eng..

[25]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.