Letter from the Editor-in-Chief TCDE Chair Election

Sharing structured data today requires agreeing on a standard schema, then mapping and cleaning all of the data to achieve a single queriable mediated instance. However, for settings in which structured data is collaboratively authored by a large community, such as in the sciences, there is seldom consensus about how the data should be represented, what is correct, and which sources are authoritative. Moreover, such data is dynamic: it is frequently updated, cleaned, and annotated. The ORCHESTRA collaborative data sharing system develops a new architecture and consistency model for such settings, based on the needs of data sharing in the life sciences. A key aspect of ORCHESTRA’s design is that the provenance of data is recorded at every step. In this paper we describe ORCHESTRA’s provenance model and architecture, emphasizing its integral use of provenance in enforcing trust policies and translating updates efficiently.

[1]  Yogesh L. Simmhan,et al.  Building Reliable Data Pipelines for Managing Community Data Using Scientific Workflows , 2009, 2009 Fifth IEEE International Conference on e-Science.

[2]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[3]  Toby Green,et al.  We need publishing standards for datasets and data tables , 2009, Learn. Publ..

[4]  Bertram Ludäscher,et al.  Scientific workflow design for mere mortals , 2009, Future Gener. Comput. Syst..

[5]  Sarah Callaghan,et al.  Overlay Journals and Data Publishing in the Meteorological Sciences , 2009 .

[6]  Cláudio T. Silva,et al.  Examining Statistics of Workflow Evolution Provenance: A First Study , 2008, SSDBM.

[7]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[8]  David Charles De Roure,et al.  myExperiment: social networking for workflow-using e-scientists , 2007, WORKS '07.

[9]  Micah Altman,et al.  A Proposed Standard for the Scholarly Citation of Quantitative Data , 2008, IASSIST Conference.

[10]  Peter Buneman,et al.  How to cite curated databases and how to make them citable , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[11]  Jens Klump,et al.  Data publication in the open access initiative , 2006, Data Sci. J..

[12]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[13]  Jeremy J. Carroll,et al.  Named graphs, provenance and trust , 2005, WWW '05.

[14]  Keishi Tajima,et al.  Archiving scientific data , 2002, SIGMOD '02.

[15]  S. S. Carlton,et al.  Databases of Legislation the Problems of Consolidations , 1994 .

[16]  Maurice B. Line,et al.  PROGRESS IN DOCUMENTATION: ‘obsolescence’ and changes in the use of literature with time , 1974 .