Digital archives are dedicated to the long-term preservation of electronic information and have the mandate to enable sustained access despite a rapidly changing information infrastructure. Current archival approaches build upon standardized data formats and simple metadata mechanisms for collection management, but do not involve high-level conceptual models and knowledge representations. This results in serious limitations, not only for expressing various kinds of information and knowledge about the archived data, but also for creating infrastructure independent, self-validating and self-instantiating archives. To overcome these limitations, we first propose a scalable XML-based archival infrastructure, based on standard tools, and subsequently show how this architecture can be extended to a model-based framework, where higher-level knowledge representations become an integral part of the archive and the ingestion/migration processes. This allows us to maximize infrastructure independence by archiving generic, executable specifications of: archival constraints (i.e., model validators); and archival transformations that are part of the ingestion process. The proposed architecture facilitates construction of self-validating and self-instantiating knowledge-based archives. We illustrate our overall approach and report on first experiences using a sample collection from a collaboration with the National Archives and Records Administration (NARA).
[1]
Chaitanya K. Baru,et al.
XML-based information mediation for digital libraries
,
1999,
DL '99.
[2]
C. M. Sperberg-McQueen,et al.
Extensible Markup Language (XML)
,
1997,
World Wide Web J..
[3]
Alan R. Heminger,et al.
Digital Rosetta Stone: a conceptual model for maintaining long-term access to digital documents
,
2012,
Proceedings of the Thirty-First Hawaii International Conference on System Sciences.
[4]
Chaitanya K. Baru,et al.
Collection-Based Persistent Digital Archives - Part 1
,
2000,
D Lib Mag..
[5]
Sriram Raghavan,et al.
Search Middleware and the Simple Digital Library Interoperability Protocol
,
2000,
D Lib Mag..
[6]
Gregory S. Hunter.
Preserving Digital Information : A How-To-Do-It Manual
,
2000
.
[7]
Michael Kifer,et al.
Logical foundations of object-oriented and frame-based languages
,
1995,
JACM.
[8]
John Garrett,et al.
Preserving Digital Information. Report of the Task Force on Archiving of Digital Information.
,
1996
.
[9]
Richard Marciano,et al.
The Senate Legislative Activities Collection (SLA): a Case Study Infrastructure Research to Support Preservation Strategies
,
2001
.