Citable by Design - A Model for Making Data in Dynamic Environments Citable

Data forms the basis for research publications. But still the focus of researchers is a paper based publication, data is rather seen as a supplement that could be offered as a download, often without further comments. Yet validation, verification, reproduction and re-usage of existing knowledge can only be applied when the research data is accessible and identifiable. For this reason, precise data citation mechanisms are required, that allow reproducing experiments with exactly the same data basis. In this paper, we propose a model that enables to cite, identify and reference specific data sets within their dynamic environments. Our model allows the selection of subsets that support experiment verification and result re-utilisation in different contexts. The approach is based on assigning persistent identifiers to timestamped queries which are executed against timestamped and versioned databases. This facilitates transparent implementation and scalable means to ensure identical result sets being delivered upon re-invocation of the query.