From Sound to ‘ Sense ’ via Feature Extraction and Machine Learning : Deriving High-Level Descriptors for Characterising Music

Research in intelligent music processing is experiencing an enormous boost these days due to the emergence of the new application and research field of Music Information Retrieval (MIR). The rapid growth of digital music collections and the concomitant shift of the music market towards digital music distribution urgently call for intelligent computational support in the automated handling of large amounts of digital music. Ideas for a large variety of content-based music services are currently being developed in music industry and in the research community. They range from content-based music search engines to automatic music recommendation services, from intuitive interfaces on portable music players to methods for the automatic structuring and visualisation of large digital music collections, and from personalised radio stations to tools that permit the listener to actively modify and ‘play with’ the music as it is being played. What all of these content-based services have in common is that they require the computer to be able to ‘make sense of’ and ‘understand’ the actual content of the music, in the sense of being able to recognise and extract musically, perceptually and contextually meaningful (‘semantic’) patterns from recordings, and to associate descriptors with the music that make sense to human listeners. There is a large variety of musical descriptors that are potentially of interest. They range from low-level features of the sound, such as its bass content or its harmonic richness, to high-level concepts such as “hip hop” or “sad music”. Also, semantic descriptors may come in the form of atomic, discrete labels like “rhythmic” or “waltz”, or they may be complex, structured entities such as harmony and rhythmic structure. As it is impossible to cover all of these in one coherent chapter, we will have to limit ourselves to a particular class of semantic desciptors. This chapter, then, focuses on methods for automatically extracting high-level atomic descriptors for the characterisation of music. It will be shown how high-level terms can be inferred via a combination of bottom-up audio descriptor extraction and the application of machine learning algorithms. Also, it will be shown that meaningful descriptors can be extracted not just from an analysis of the music (audio) itself, but also from extra-musical sources, such as the internet (via ‘web mining’).

[1]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[2]  Pasi Koikkalainen,et al.  Self-organizing hierarchical feature maps , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[3]  Risto Miikkulainen,et al.  Script Recognition with Hierarchical Feature Maps , 1992 .

[4]  Curtis Roads,et al.  The Computer Music Tutorial , 1996 .

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[7]  Masataka Goto,et al.  A Real-time Music Scene Description System: Detecting Melody and Bass Lines in Audio Signals , 1999 .

[8]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[9]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[10]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[11]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[12]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[13]  François Pachet,et al.  Scaling up music playlist generation , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[14]  Paris Smaragdis,et al.  Combining Musical and Cultural Features for Intelligent Style Detection , 2002, ISMIR.

[15]  Elias Pampalk,et al.  Content-based organization and visualization of music archives , 2002, MULTIMEDIA '02.

[16]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[17]  François Pachet,et al.  Music Similarity Measures: What's the use? , 2002, ISMIR.

[18]  Steve Lawrence,et al.  Inferring Descriptions and Similarity for Music from Community Metadata , 2002, ICMC.

[19]  Qi Tian,et al.  Musical genre classification using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[20]  Masataka Goto,et al.  SmartMusicKIOSK: music listening station with chorus-search function , 2003, UIST '03.

[21]  Oliver Hummel,et al.  Using cultural metadata for artist recommendations , 2003, Proceedings Third International Conference on WEB Delivering of Music.

[22]  Anssi Klapuri,et al.  Melody Description and Extraction in the Context of Music Content Processing , 2003 .

[23]  Gerhard Widmer,et al.  Classification of dance music by periodicity patterns , 2003, ISMIR.

[24]  Lie Lu,et al.  Automatic mood detection from acoustic music data , 2003, ISMIR.

[25]  Tao Li,et al.  Detecting emotion in music , 2003, ISMIR.

[26]  J. Jośe A HIERARCHICAL APPROACH TO AUTOMATIC MUSICAL GENRE CLASSIFICATION , 2003 .

[27]  Perfecto Herrera-Boyer,et al.  Automatic Classification of Musical Instrument Sounds , 2003 .

[28]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[29]  Alessandro Lameiras Koerich,et al.  Automatic classification of audio data , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[30]  Anssi Klapuri,et al.  Automatic Music Transcription as We Know it Today , 2004 .

[31]  Daniel P. W. Ellis,et al.  Automatic Record Reviews , 2004, ISMIR.

[32]  Peter Knees,et al.  Artist Classification with Web-Based Data , 2004, ISMIR.

[33]  Takuya Yoshioka,et al.  Automatic Chord Transcription with Concurrent Recognition of Chord Symbols and Boundaries , 2004, ISMIR.

[34]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[35]  Gerhard Widmer,et al.  Towards Characterisation of Music via Rhythmic Patterns , 2004, ISMIR.

[36]  Guy J. Brown,et al.  Extracting Melody Lines From Complex Audio , 2004, ISMIR.

[37]  Eric Allamanche,et al.  Automatic Optimization of a Music Similarity Metric using Similarity Pairs , 2004 .

[38]  Gerhard Widmer,et al.  Exploring Music Collections by Browsing Different Views , 2004, Computer Music Journal.

[39]  J. Stephen Downie,et al.  The International Music Information Retrieval Systems Evaluation Laboratory: Governance, Access and Security , 2004, ISMIR.

[40]  Emilia Gómez,et al.  Estimating The Tonality Of Polyphonic Audio Files: Cognitive Versus Machine Learning Modelling Strategies , 2004, ISMIR.

[41]  Christian Dittmar,et al.  Drum Pattern Based Genre Classification of Popular Music , 2004 .

[42]  Stephen Cox,et al.  Features and classifiers for the automatic classification of musical audio signals , 2004, ISMIR.

[43]  Nicolas Lefebvre,et al.  Music Genre Estimation from Low Level Audio Features , 2004 .

[44]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[45]  Gerhard Widmer,et al.  Evaluating Rhythmic descriptors for Musical Genre Classification , 2004 .

[46]  Masataka Goto,et al.  Automatic Drum Sound Description for Real-World Music Using Template Adaptation and Matching Methods , 2004, ISMIR.

[47]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[48]  Miguel A. Alonso,et al.  Tempo And Beat Estimation Of Musical Signals , 2004, ISMIR.

[49]  Katharina Morik,et al.  Automatic Feature Extraction for Classifying Audio Data , 2005, Machine Learning.

[50]  Gerhard Widmer,et al.  HIDDEN MARKOV MODELS FOR SPECTRAL SIMILARITY OF SONGS , 2005 .

[51]  Gerhard Widmer,et al.  Hierarchical Organization and Description of Music Collections at the Artist Level , 2005, ECDL.

[52]  Peter Knees,et al.  Multiple Lyrics Alignment: Automatic Retrieval of Song Lyrics , 2005, ISMIR.

[53]  Simon Dixon,et al.  A Review of Automatic Rhythm Description Systems , 2005, Computer Music Journal.

[54]  M. V. Velzen,et al.  Self-organizing maps , 2007 .