Archival description and linked data: a preliminary study of opportunities and implementation challenges

This paper presents the results of a study to investigate how archives can connect their collections to related data sources through the use of Semantic Web technologies, specifically Linked Data. Questions explored included (a) What types of data currently available in archival surrogates such as Encoded Archival Description (EAD) finding aids and Machine-Readable Cataloging (MARC) records may be useful if converted to Linked Data? (b) For those potentially useful data points identified in archival surrogates, how might one align data structures found in those surrogates to the data structures of other relevant internal or external information sources? (c) What features of current standards and data structures present impediments or challenges that must be overcome in order to achieve interoperability among disparate data sources? To answer these questions, the researcher identified metadata elements of potential use as Linked Data in archival surrogates, as well as metadata element sets and vocabularies of data sets that could serve as pathways to relevant external data sources. Data sets chosen for the study included DBpedia and schema.org; metadata element sets examined included Friend of a Friend (FOAF), GeoNames, and Linking Open Description of Events (LODE). The researcher then aligned tags found in the EAD encoding standard to related classes and properties found in these Linked Data sources and metadata element sets. To investigate the third question about impediments to incorporating Linked Data in archival descriptions, the researcher analyzed the locations and frequencies at which controlled and uncontrolled access points (personal and family name, corporate name, geographic name, and genre/form entities) appeared in a sample of MARC and EAD archival descriptive records by using a combination of hand counts and the natural language processing (NLP) tool, OpenCalais. The results of the location and frequency analysis, combined with the results of the alignment process, helped the researcher identify several critical challenges currently impeding interoperability among archival information systems and relevant Linked Data sources, including differences in granularity between archival and other data source vocabularies, and inadequacies of current encoding standards to support semantic tagging of potential access points embedded in free text areas of archival surrogates.

[1]  Karen Coyle Linked Data Tools: Connecting on the Web , 2012 .

[2]  Francesca Ricci,et al.  EAC-CPF Ontology and Linked Archival Data , 2011, SDA.

[3]  bethan ruddock,et al.  Creating linked open data for library and archive descriptions , 2011 .

[4]  Chris Sheppard,et al.  Survey of Special Collections and Archives in the United Kingdom and Ireland. , 2013 .

[5]  Richard J. Cox,et al.  Revisiting the Archival Finding Aid , 2008 .

[6]  Kathleen Feeney Retrieval of Archival Finding Aids Using World-Wide-Web Search Engines , 2009 .

[7]  Eero Hyvönen,et al.  History on the Semantic Web as Linked Data - An Event Gazetteer and Timeline for the World War I , 2012 .

[8]  Elizabeth Yakel Encoded Archival Description: Are Finding Aids Boundary Spanners or Barriers for Users? , 2004 .

[9]  Heather MacNeil,et al.  What finding aids do: archival description as rhetorical genre in traditional and web-based environments , 2012 .

[10]  Clifford A. Lynch Digital Collections, Digital Libraries and the Digitization of Cultural Heritage Information , 2002, First Monday.

[11]  H. Tibbo Primarily History in America: How U.S. Historians Search for Primary Materials at the Dawn of the Digital Age , 2007 .

[12]  Maria Cristina Pattuelli,et al.  Personal name vocabularies as linked open data: A case study of jazz artist names , 2012, J. Inf. Sci..

[13]  Matthew Young Eidson,et al.  Describing Anything That Walks , 2002 .

[14]  Clay Redding Reengineering Finding Aids Revisited , 2002 .

[15]  Elizabeth J. Cox,et al.  Subject Access Points in the MARC Record and Archival Finding Aid: Enough or Too Many? , 2008 .

[16]  Lora Aroyo,et al.  Automatic Heritage Metadata Enrichment with Historic Events , 2011 .

[17]  Tom Nesmith,et al.  Reopening Archives: Bringing New Contextualities into Archival Theory and Practice , 2006 .

[18]  Daniel Hienert,et al.  Extraction of Historical Events from Wikipedia , 2012, KNOW@LOD.

[19]  Michelle Light,et al.  Colophons and Annotations: New Directions for the Finding Aid , 2007 .

[20]  Antoine Isaac,et al.  Europeana: Moving to Linked Open Data , 2012 .

[21]  Lisa R. Coats Users of EAD Finding Aids: Who Are They and Are They Satisfied? , 2004 .

[22]  Mary Pugh,et al.  The Illusion of Omniscience: Subject Access and the Reference Archivist , 2010 .

[23]  R. Lytle,et al.  Intellectual Access to Archives , 2010 .

[24]  Catherine A. Johnson,et al.  Where Is the List with All the Names? Information-Seeking Behavior of Genealogists , 2007 .

[25]  Wendy Duff,et al.  Transforming the Crazy Quilt: Archival Displays from a User's Point of View , 1998 .

[26]  Richard Pearce-Moses,et al.  Does AMC Mean "Archives Made Confusing"? Patron Understanding of USMARC AMC Catalog Records , 2009 .

[27]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[28]  Andrew Dillon,et al.  The evolution of the finding aid in the United States: from physical to digital document genre , 2012 .

[29]  Richard Berner,et al.  Manuscript Catalogs and Other Finding Aids: What Are Their Relationships? , 2010 .

[30]  Claire Gabriel Subject Access to Archives and Manuscript Collections , 2002 .

[31]  Michelle Mascaro,et al.  Controlled Access Headings in EAD Finding Aids: Current Practices in Number of and Types of Headings Assigned , 2011 .

[32]  Susan Hamburger How Researchers Search for Manuscript and Archival Collections , 2004 .

[33]  Helen R. Tibbo Primarily history: historians and the search for primary source materials , 2002, JCDL '02.

[34]  Raphaël Troncy,et al.  NERD: evaluating named entity recognition tools in the web of data , 2011 .

[35]  Elizabeth J. Shaw,et al.  Rethinking EAD: Balancing flexibility and interoperability , 2001 .

[36]  Jiayu Tang,et al.  Linking archival data to location: a case study at the UK National Archives , 2011, Aslib Proc..

[37]  Christopher J. Prom User interactions with electronic finding aids in a controlled setting , 2007 .

[38]  Ray R. Larson,et al.  Connecting Archival Collections: The Social Networks and Archival Context Project , 2011, TPDL.

[39]  Wendy Duff Evaluating metadata on a metalevel , 2001 .

[40]  Tobias Blanke,et al.  Information Extraction on Noisy Texts for Historical Research , 2012, DH.

[41]  Wendy Scheir,et al.  First Entry: Report on a Qualitative Exploratory Study of Novice User Experience with Online Finding Aids , 2006 .

[42]  Richard Wallis,et al.  OCLC's Linked Data Initiative: Using Schema.org to Make Library Data Relevant on the Web , 2012 .