Linked Library Data Now!

It is, of course, not news to anybody in the library profession that our data are a mess. We have silo sitting next to silo, with much duplication of data; arcane, inefficient, and sometimes completely broken methods of determining that two records are describing the same thing; and very little control over relating one resource in one system to another in a completely different application (even if these services serve a similar purpose), much less data available outside the institution. Our identifiers are weak, especially beyond very specific, record-level identifiers (OCLC number, ISSN, ISBN, 001/035 for local purposes, and the like), and even these are subsets of our collections and possibly invalid (especially ISxNs). The actual concepts contained within these records—creators, organizations, subjects, series, and so on—are mere strings that are subject to variations in who cataloged them or simply the passage of time and the inevitability of mortality. Is “Reagan, Ronald 1911-” really a valid, stable, and persistent identifier? Libraries tend to view their information solely as records: self-contained and able to exist in isolation, without any external dependencies. What this does is put tremendous burden upon these records to include a tremendous amount of duplicated and possibly inconsistent metadata. As the number of records expands, these redundancies grow, more errors creep in, and the ability to authoritatively relate these strings to each other wanes. This document-centric world view of librarians has historical roots, of course. When all of the information about an item in your collection had to fit on a single 7.5 × 12.5 cm card, it made sense to bundle it all together. Because libraries still use the MARC format as the lingua franca of information sharing, this precedent still holds. Even as libraries have moved onto the Web and their file formats have moved to XML, this is