Revision provenance in text documents of asynchronous collaboration

Many text documents today are collaboratively edited, often with multiple small changes. The problem we consider in this paper is how to find provenance for a specific part of interest in the document. A full revision history, represented as a version tree, can tell us about all updates made to the document, but most of these updates may apply to other parts of the document, and hence not be relevant to answer the provenance question at hand. In this paper, we propose the notion of a revision unit as a flexible unit to capture the necessary provenance. We demonstrate through experiments the capability of the revision units in keeping only relevant updates in the provenance representation and the flexibility of the revision units in adjusting to updates reflected in the version tree.

[1]  Mikalai Sabel Structuring wiki revision history , 2007, WikiSym '07.

[2]  Benjamin Livshits,et al.  DynaMine: finding common error patterns by mining software revision histories , 2005, ESEC/FSE-13.

[3]  Luca de Alfaro,et al.  A content-driven reputation system for the wikipedia , 2007, WWW '07.

[4]  James Cheney,et al.  Curated databases , 2008, PODS.

[5]  Deborah L. McGuinness,et al.  Mining Revision History to Assess Trustworthiness of Article Fragments , 2006, 2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[6]  Jennifer Widom,et al.  Practical lineage tracing in data warehouses , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[7]  Elif Yamangil,et al.  Mining Wikipedia's Article Revision History for Training Computational Linguistics Algorithms , 2008 .

[8]  Martin Wattenberg,et al.  Studying cooperation and conflict between authors with history flow visualizations , 2004, CHI.

[9]  James Cheney,et al.  Provenance as Dependency Analysis , 2007, DBPL.

[10]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[11]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[12]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[13]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..