Data quality assessment in digital score libraries

Sheet music scores have been the traditional way to preserve and disseminate western classical music works for centuries. Nowadays, their content can be encoded in digital formats that yield a very detailed representation of music content expressed in the language of music notation. These digital scores constitute, therefore, an invaluable asset for digital library services such as search, analysis, clustering, recommendations, and synchronization with audio files. Digital scores, like any other published data, may suffer from quality problems. For instance, they can contain incomplete or inaccurate elements. As a “dirty” dataset may be an irrelevant input for some use cases, users need to be able to estimate the quality level of the data they are about to use. This article presents the data quality management framework for digital score libraries (DSL) designed by the GioQoso multi-disciplinary project. It relies on a content model that identifies several information levels that are unfortunately blurred out in digital score encodings. This content model then serves as a foundation to organize the categories of quality issues that can occur in a music score, leading to a quality model. The quality model also positions each issue with respect to potential usage contexts, allowing attachment of a consistent set of indicators that together measure how a given score is fit to a specific usage. We finally report an implementation of these conceptual foundations in an online DSL.

[1]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[2]  Thomas Redman,et al.  Data quality for the information age , 1996 .

[3]  Perry Roland,et al.  Verovio: A library for Engraving MEI Music Notation into SVG , 2014, ISMIR.

[4]  Virginie Thion,et al.  A Methodology for Quality Assessment in Collaborative Score Libraries , 2016, ISMIR.

[5]  Liyana Shuib,et al.  A Survey of User Profiling: State-of-the-Art, Challenges, and Solutions , 2019, IEEE Access.

[6]  E. Cambouropoulos Voice And Stream: Perceptual And Computational Modeling Of Voice Separation , 2008 .

[7]  Jenn Riley,et al.  Ask a Librarian: The Role of Librarians in the Music Information Retrieval Community , 2006, ISMIR.

[8]  Maribel Acosta,et al.  Crowdsourcing Linked Data Quality Assessment , 2013, SEMWEB.

[9]  Virginie Thion,et al.  Gioqoso, an online Quality Assessment Tool for Music Notation , 2018 .

[10]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications) , 2006 .

[11]  Yolanda Gil,et al.  Trusting Information Sources One Citizen at a Time , 2002, SEMWEB.

[12]  Samira Si-Said Cherfi,et al.  Formalizing quality rules on music notation. An ontology-based approach , 2017 .

[13]  Samira Si-Said Cherfi,et al.  Ontology-Based Annotation of Music Scores , 2017, K-CAP.

[14]  Christopher Ariza,et al.  Music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data , 2010, ISMIR.