论文信息 - A preliminary evaluation of hathitrust metadata: Assessing the sufficiency of legacy records

A preliminary evaluation of hathitrust metadata: Assessing the sufficiency of legacy records

Print-based libraries use metadata (specifically MARC catalog records) for both bibliographic control and to support discovery through online public access catalogs. Depending on its accuracy, completeness, and detail, metadata can afford an aerial view of a collection's topical strengths, scope of coverage, and item-to-item relationships, but the view offered is in part a function of metadata design. Most MARC records were created to support management of large print collections and optimized to meet the requirements of library online public access catalogs. How well do pre-existing MARC records serve the discovery needs of scholars using a large-scale digital library hosting collections of retrospectively digitized books and serials? This paper reports on an ongoing assessment of the utility of the MARC-based metadata underlying the HathiTrust Digital Library and explores the implications for advanced computational access to texts in the HathiTrust. We consider here the utility of metadata to scholars creating worksets for analysis, examining three user scenarios, which were gleaned from an ongoing user-requirements study done for the HathiTrust Research Center: (1) using metadata fields in combination for corpus characterization and discovery; (2) relying on metadata to identify resources of interest; and (3) using bibliographies of known items to seed research worksets. Our goal is to better understand the need for metadata remediation and augmentation and assess the scope of additional work required.

Katrina Fenlon | Timothy W. Cole | Colleen Fallaw | Myung-Ja Han

[1] Timothy W. Cole. Creating a Framework of Guidance for Building Good Digital Collections , 2002, First Monday.

[2] Katrina Fenlon,et al. Beyond size and search: Building contextual mass in digital aggregations for scholarly use , 2010, ASIST.

[3] Constance Malpas. Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library Environment , 2011 .

[4] Helmut Berger,et al. A Comparison of Text-Categorization Methods Applied to N-Gram Frequency Statistics , 2004, Australian Conference on Artificial Intelligence.

[5] Besiki Stvilia,et al. Is 'Quality' Metadata 'Shareable' Metadata? The Implications of Local Metadata Practice on Federated Collections , 2005 .

[6] Diane Hillmann,et al. Analyzing Metadata for Effective Use and Re-Use , 2003, Dublin Core Conference.

[7] Carole L. Palmer,et al. Trends in metadata practices: a longitudinal study of collection federation , 2007, JCDL '07.

[8] J. Stephen Downie,et al. Scholar-built collections: A study of user requirements for research in large-scale digital libraries , 2014, ASIST.