Audio Content-Based Music Retrieval

The rapidly growing corpus of digital audio material requires novel retrieval strategies for exploring large music collections. Traditional retrieval strategies rely on metadata that describe the actual audio content in words. In the case that such textual descriptions are not available, one requires content-based retrieval strategies which only utilize the raw audio material. In this contribution, we discuss content-based retrieval strategies that follow the query-by-example paradigm: given an audio query, the task is to retrieve all documents that are somehow similar or related to the query from a music collection. Such strategies can be loosely classified according to their "specificity", which refers to the degree of similarity between the query and the database documents. Here, high specificity refers to a strict notion of similarity, whereas low specificity to a rather vague one. Furthermore, we introduce a second classification principle based on "granularity", where one distinguishes between fragment-level and document-level retrieval. Using a classification scheme based on specificity and granularity, we identify various classes of retrieval scenarios, which comprise "audio identification", "audio matching", and "version identification". For these three important classes, we give an overview of representative state-of-the-art approaches, which also illustrate the sometimes subtle but crucial differences between the retrieval scenarios. Finally, we give an outlook on a user-oriented retrieval system, which combines the various retrieval strategies in a unified framework.

[1]  Geoffroy Peeters,et al.  Audio identification based on spectral modeling of bark-bands energy and synchronization through onset detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Meinard Müller,et al.  Automatic Mapping of Scanned Sheet Music to Audio Recordings , 2008, ISMIR.

[3]  Daniel Müllensiefen,et al.  Court decisions on music plagiarism and the predictive value of similarity algorithms , 2009 .

[4]  日本音響学会 Acoustical science and technology , 2001 .

[5]  Mathieu Lagrange,et al.  Unsupervised Accuracy Improvement for Cover Song Detection Using Spectral Connectivity Network , 2010, ISMIR.

[6]  Frank Kurth,et al.  Identification of Highly Distorted Audio Material for Querying Large Scale Data Bases , 2002 .

[7]  Malcolm Slaney,et al.  Analysis of Minimum Distances in High-Dimensional Musical Spaces , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Xavier Serra,et al.  Predictability of Music Descriptor Time Series and its Application to Cover Song Detection , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Bryan A. Pendleton,et al.  Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoisie , 2006 .

[10]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[11]  Juan Pablo Bello,et al.  Audio-Based Cover Song Retrieval Using Approximate Chord Sequences: Testing Shifts, Gaps, Swaps and Beats , 2007, ISMIR.

[12]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008, Acoustical Science and Technology.

[13]  Björn W. Schuller,et al.  Automatic Assessment of Singer Traits in Popular Music: Gender, Age, Height and Race , 2011, ISMIR.

[14]  Constantin Papaodysseus,et al.  On the automated recognition of seriously distorted musical recordings , 2001, IEEE Trans. Signal Process..

[15]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[16]  Daniel P. W. Ellis,et al.  Cover song detection: From high scores to general classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[18]  N. Scaringella,et al.  Automatic genre classification of music content: a survey , 2006, IEEE Signal Process. Mag..

[19]  Lei Chen,et al.  Local summarization and multi-level LSH for retrieving multi-variant audio tracks , 2009, MM '09.

[20]  Paul Lamere,et al.  A Model-Based Approach to Constructing Music Similarity Functions , 2007, EURASIP J. Adv. Signal Process..

[21]  Thierry Bertin-Mahieux,et al.  Large-scale cover song recognition using hashed chroma landmarks , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[22]  George Tzanetakis,et al.  Polyphonic audio matching and alignment for music retrieval , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[23]  Peter Grosche,et al.  Toward characteristic audio shingles for efficient cross-version music retrieval , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Simon Dixon,et al.  Approximate Note Transcription for the Improved Identification of Difficult Chords , 2010, ISMIR.

[25]  Hsin-Min Wang,et al.  Using the Similarity of Main Melodies to Identify Cover Versions of Popular Songs for Music Document Retrieval , 2008, J. Inf. Sci. Eng..

[26]  R. Andrzejak,et al.  Cross recurrence quantification for cover song identification , 2009 .

[27]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[29]  Xavier Serra,et al.  Characterization and exploitation of community structure in cover song networks , 2011, Pattern Recognit. Lett..

[30]  Matija Marolt,et al.  A Mid-Level Representation for Melody-Based Retrieval in Audio Collections , 2008, IEEE Transactions on Multimedia.

[31]  Emiru Tsunoo,et al.  Music mood classification by rhythm and bass-line unit pattern analysis , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Xavier Serra,et al.  Indexing music by mood: design and integration of an automatic content-based annotator , 2010, Multimedia Tools and Applications.

[33]  Zhouyu Fu,et al.  A Survey of Audio-Based Music Classification and Annotation , 2011, IEEE Transactions on Multimedia.

[34]  Emilia Gómez,et al.  Audio Cover Song Identification and Similarity: Background, Approaches, Evaluation, and Beyond , 2010, Advances in Music Information Retrieval.

[35]  Meinard Müller,et al.  Chroma Toolbox: Matlab Implementations for Extracting Variants of Chroma-Based Audio Features , 2011, ISMIR.

[36]  Xavier Serra,et al.  Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  Gaël Richard,et al.  Robust frequency-based Audio Fingerprinting , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  Xavier Serra,et al.  Unifying Low-Level and High-Level Music Similarity Measures , 2011, IEEE Transactions on Multimedia.

[39]  Gaël Richard,et al.  A Scalable Audio Fingerprint Method with Robustness to Pitch-Shifting , 2011, ISMIR.

[40]  Joan Serrà,et al.  Identification of versions of the same musical composition by processing audio descriptions , 2011 .

[41]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[42]  Daniel P. W. Ellis,et al.  Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[43]  Pedro Cano,et al.  A review of algorithms for audio fingerprinting , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[44]  Jürgen Herre,et al.  AudioID: Towards Content-Based Identification of Audio Material , 2001 .

[45]  Meinard Müller,et al.  Efficient Index-Based Audio Matching , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  Pedro Cano,et al.  A Review of Audio Fingerprinting , 2005, J. VLSI Signal Process..

[47]  Mathieu Lagrange,et al.  Multimodal similarity between musical streams for cover version detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  Thierry Bertin-Mahieux,et al.  Autotagger: A Model for Predicting Social Tags from Acoustic Features on Large Music Databases , 2008 .

[49]  Jeffrey J. Scott,et al.  MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW , 2010 .

[50]  Gert R. G. Lanckriet,et al.  Five Approaches to Collecting Tags for Music , 2008, ISMIR.

[51]  Björn W. Schuller,et al.  Tango or Waltz?: Putting Ballroom Dance Style into Tempo Detection , 2008, EURASIP J. Audio Speech Music. Process..

[52]  Peter Knees,et al.  On Rhythm and General Music Similarity , 2009, ISMIR.

[53]  Thomas Sikora,et al.  MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval , 2005 .

[54]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[55]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[56]  Mert Bay,et al.  The 2007 MIREX Audio Mood Classification Task: Lessons Learned , 2008, ISMIR.

[57]  Meinard Müller,et al.  Audio Matching via Chroma-Based Statistical Features , 2005, ISMIR.

[58]  Meinard Müller,et al.  Towards Timbre-Invariant Audio Features for Harmony-Based Music , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[59]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[60]  Helmut Neuschmied,et al.  Robust Sound Modeling for Song Detection in Broadcast Audio , 2002 .

[61]  Emilia Gómez,et al.  Automatic Tonal Analysis from Music Summaries for Version Identification , 2006 .

[62]  Mark B. Sandler,et al.  Polyphonic Score Retrieval Using Polyphonic Audio Queries: A Harmonic Modeling Approach , 2003, ISMIR.

[63]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[64]  Elias Pampalk,et al.  Computational Models of Music Similarity and their Application in Music Information Retrieval , 2006 .

[65]  Shuicheng Yan,et al.  Effective music tagging through advanced statistical modeling , 2010, SIGIR.

[66]  Thierry Bertin-Mahieux,et al.  Automatic Tagging of Audio: The State-of-the-Art , 2011 .

[67]  Paul Lamere,et al.  Social Tagging and Music Information Retrieval , 2008 .

[68]  Ton Kalker,et al.  Speed-change resistant audio fingerprinting using auto-correlation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[69]  Peter Grosche,et al.  Toward musically-motivated audio fingerprints , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[70]  Òscar Celma,et al.  Music Recommendation and Discovery - The Long Tail, Long Fail, and Long Play in the Digital Music Space , 2010 .

[71]  Falk Scholer,et al.  Searching Musical Audio Using Symbolic Queries , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[72]  J. Van Balen,et al.  Automatic Recognition of Samples in Musical Audio , 2011 .

[73]  Derek Hoiem,et al.  Computer vision for music identification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[74]  F. Gouyon A computational approach to rhythm description - Audio features for the computation of rhythm periodicity functions and their use in tempo induction and music content processing , 2005 .