Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies

There has been a rapid growth of digitally available music data, including audio recordings, digitized images of sheet music, album covers and liner notes, and video clips. This huge amount of data calls for retrieval strategies that allow users to explore large music collections in a convenient way. More precisely, there is a need for cross-modal retrieval algorithms that, given a query in one modality (e.g., a short audio excerpt), find corresponding information and entities in other modalities (e.g., the name of the piece and the sheet music). This goes beyond exact audio identification and subsequent retrieval of metainformation as performed by commercial applications like Shazam [1].

[1]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[2]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[3]  Gerhard Widmer,et al.  Tempo- and Transposition-invariant Identification of Piece and Score Position , 2014, ISMIR.

[4]  Gerhard Widmer,et al.  Fast Identification of Piece and Score Position via Symbolic Fingerprinting , 2012, ISMIR.

[5]  Emilia Gómez,et al.  Audio Cover Song Identification and Similarity: Background, Approaches, Evaluation, and Beyond , 2010, Advances in Music Information Retrieval.

[6]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[7]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[8]  Gerhard Widmer,et al.  The Piano Music Companion , 2014, ECAI.

[9]  Peter Grosche,et al.  Toward characteristic audio shingles for efficient cross-version music retrieval , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Meinard Müller,et al.  Automated Synchronization of Scanned Sheet Music with Audio Recordings , 2007, ISMIR.

[11]  H. Barlow,et al.  A dictionary of musical themes , 1975 .

[12]  Emilia Gómez,et al.  Tonal representations for music retrieval: from version identification to query-by-humming , 2012, International Journal of Multimedia Information Retrieval.

[13]  Malcolm Slaney,et al.  Analysis of Minimum Distances in High-Dimensional Musical Spaces , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Meinard Müller,et al.  Matching Musical Themes based on noisy OCR and OMR input , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[16]  Gerhard Widmer,et al.  Robust Quad-Based Audio Fingerprinting , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Meinard Müller,et al.  A digital library framework for heterogeneous music collections: from document acquisition to cross-modal interaction , 2012, International Journal on Digital Libraries.

[18]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[20]  Christopher Raphael,et al.  Music score alignment and computer accompaniment , 2006, CACM.

[21]  Meinard Müller,et al.  Fundamentals of Music Processing , 2015, Springer International Publishing.

[22]  Gerhard Widmer,et al.  Learning Audio-Sheet Music Correspondences for Score Identification and Offline Alignment , 2017, ISMIR.

[23]  Jakob Grue Simonsen,et al.  Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images , 2015 .

[24]  Marc Leman,et al.  Panako - A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification , 2014, ISMIR.

[25]  Meinard Müller,et al.  Bridging the Gap: Enriching YouTube Videos with Jazz Music Annotations , 2018, Front. Digit. Humanit..

[26]  Meinard Müller,et al.  Retrieving audio recordings using musical themes , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Pedro Cano,et al.  A Review of Audio Fingerprinting , 2005, J. VLSI Signal Process..

[28]  Gerhard Widmer,et al.  Classical Music on the Web - User Interfaces and Data Representations , 2015, ISMIR.

[29]  Gerhard Widmer,et al.  End-to-end cross-modality retrieval with CCA projections and pairwise ranking loss , 2017, International Journal of Multimedia Information Retrieval.

[30]  Markus Schedl,et al.  Polyphonic piano note transcription with recurrent neural networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Emilia Gómez,et al.  A Comparison of Melody Extraction Methods Based on Source-Filter Modelling , 2016, ISMIR.