Semi-automated collection evaluation for large-scale aggregations

Library and museum digital collections are increasingly aggregated at various levels. Large-scale aggregations, often characterized by heterogeneous or messy metadata, pose unique and growing challenges to aggregation administrators – not only in facilitating end-user discovery and access, but in performing basic administrative and curatorial tasks in a scalable way, such as finding messy data and determining the overall topical landscape of the aggregation. This poster describes early findings on using statistical text analysis techniques to improve the scalability of an aggregation development workflow for a large-scale aggregation. These techniques hold great promise for automating historically labor-intensive evaluative aspects of aggregation development and form the basis for the development of an aggregator's dashboard. The aggregator's dashboard is planned as a statistical text-analysis-driven tool for supporting large-scale aggregation development and maintenance, through multifaceted, automatic visualization of an aggregation's metadata quality and topical coverage. The administrator's dashboard will support principled yet scalable aggregation development.