Library and museum digital collections are increasingly aggregated at various levels. Large-scale aggregations, often characterized by heterogeneous or messy metadata, pose unique and growing challenges to aggregation administrators – not only in facilitating end-user discovery and access, but in performing basic administrative and curatorial tasks in a scalable way, such as finding messy data and determining the overall topical landscape of the aggregation. This poster describes early findings on using statistical text analysis techniques to improve the scalability of an aggregation development workflow for a large-scale aggregation. These techniques hold great promise for automating historically labor-intensive evaluative aspects of aggregation development and form the basis for the development of an aggregator's dashboard. The aggregator's dashboard is planned as a statistical text-analysis-driven tool for supporting large-scale aggregation development and maintenance, through multifaceted, automatic visualization of an aggregation's metadata quality and topical coverage. The administrator's dashboard will support principled yet scalable aggregation development.
[1]
Katrina Fenlon,et al.
Beyond size and search: Building contextual mass in digital aggregations for scholarly use
,
2010,
ASIST.
[2]
Diane Hillmann,et al.
Improving Metadata Quality: Augmentation and Recombination
,
2004,
Dublin Core Conference.
[3]
Gordon Dunsire,et al.
Collecting metadata from institutional repositories
,
2008,
OCLC Syst. Serv..
[4]
Katrina Fenlon,et al.
Building topic models in a federated digital library through selective document exclusion
,
2011,
ASIST.
[5]
Michael I. Jordan,et al.
Modeling annotated data
,
2003,
SIGIR.
[6]
Amy S. Jackson.
Preliminary Analysis of Item-level Metadata Harvested
,
2006
.