AUDIT: approving and tracking updates with dependencies in collaborative databases

Collaborative databases such as genome databases, often involve extensive curation activities where collaborators need to interact to be able to converge and agree on the content of data. In a typical scenario, a member of the collaboration makes some updates and these become visible to all collaborators for possible comments and modifications. At the same time, these updates are usually pending the approval or rejection from the data custodian based on the related discussion and the content of the data. Unfortunately, the approval and authorization of updates in current databases is based solely on the identity of the user, e.g., via the SQL GRANT and REVOKE commands. In this paper, we present a scalable cloud-based collaborative database system to support collaboration and data curation scenarios. Our system is based on an Update Pending Approval model. In a nutshell, when a collaborator updates a given data item, it is marked as pending approval until the data custodian approves or rejects the update. Until then, any other collaborator can view and comment on the data, pending its approval. We fully realized our system inside HBase, a cloud-based platform. We also conducted extensive experiments showing that the system scales well under different workloads.

[1]  Mohamed F. Mokbel,et al.  Transaction Time Support Inside a Database Engine , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Aditya G. Parameswaran,et al.  OrpheusDB: A Lightweight Approach to Relational Dataset Versioning , 2017, SIGMOD Conference.

[3]  Wang Chiew Tan,et al.  An annotation management system for relational databases , 2004, The VLDB Journal.

[4]  Walid G. Aref,et al.  HandsOn DB: Managing Data Dependencies Involving Human Actions , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[6]  Ronald Fagin,et al.  On an authorization mechanism , 1978, TODS.

[7]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[8]  Daniel Deutch,et al.  Putting Lipstick on Pig: Enabling Database-style Workflow Provenance , 2011, Proc. VLDB Endow..

[9]  Jennifer Widom,et al.  Behavior of database production rules: termination, confluence, and observable determinism , 1992, SIGMOD '92.

[10]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[11]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[12]  Bradford W. Wade,et al.  An authorization mechanism for a relational database system , 1976, TODS.

[13]  Umeshwar Dayal,et al.  Organizing long-running activities with triggers and transactions , 1990, SIGMOD '90.

[14]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[15]  Walid G. Aref,et al.  bdbms - A Database Management System for Biological Data , 2007, CIDR.

[16]  David R. Karger,et al.  Collaborative Data Analytics with DataHub , 2015, Proc. VLDB Endow..

[17]  Walid G. Aref,et al.  Approving Updates in Collaborative Databases , 2015, 2015 IEEE International Conference on Cloud Engineering.

[18]  David J. DeWitt,et al.  Integrating databases and workflow systems , 2005, SGMD.