Supporting Scientific Collaboration Through Class-Based Object Versioning

Reuse of scientific data is central to much of science. Although data produced by individual researchers and groups is made publicly available, effective sharing is often prevented by lack of common resource discovery mechanisms and by format interoperability issues. Unlike commercial databases that operate fixed programmes (e.g. mortgage plan) and variable data (e.g. interest), in a scientific environment the reverse applies and the methods to process the data changes while the original data items themselves stay unchanged. Scientists often build on existing work and try different techniques for processing datasets, necessitating changing methods. In this paper, we provide a class-based object versioning framework that supports dynamic changes to pipelines while managing dependencies. The framework addresses the management of arbitrary changes made to scripts during a data flow and the association of these changes to data created.

[1]  Cláudia Maria Lima Werner,et al.  Odyssey-VCS: a flexible version control system for UML model elements , 2005, SCM '05.

[2]  Andrey N. Belikov,et al.  Merging Grid Technologies , 2010, Journal of Grid Computing.

[3]  Luc Moreau,et al.  The Foundations for Provenance on the Web , 2010, Found. Trends Web Sci..

[4]  Alessandro Orso,et al.  A differencing algorithm for object-oriented programs , 2004 .

[5]  Leonardo Murta,et al.  Comparison and versioning of scientific workflows , 2009, 2009 ICSE Workshop on Comparison and Versioning of Software Models.

[6]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[7]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[8]  Daniel Jackson,et al.  Semantic Diff: a tool for summarizing the effects of modifications , 1994, Proceedings 1994 International Conference on Software Maintenance.

[9]  Zheng Wang,et al.  BMAT - A Binary Matching Tool for Stale Profile Propagation , 2000, J. Instr. Level Parallelism.