Automated analysis of musical structure

Listening to music and perceiving its structure is a fairly easy task for humans, even for listeners without formal musical training. For example, we can notice changes of notes, chords and keys, though we might not be able to name them (segmentation based on tonality and harmonic analysis); we can parse a musical piece into phrases or sections (segmentation based on recurrent structural analysis); we can identify and memorize the main themes or the catchiest parts---hooks---of a piece (summarization based on hook analysis); we can detect the most informative musical parts for making certain judgments ( detection of salience for classification). However, building computational models to mimic these processes is a hard problem. Furthermore, the amount of digital music that has been generated and stored has already become unfathomable. How to efficiently store and retrieve the digital content is an important real-world problem. This dissertation presents our research on automatic music segmentation, summarization and classification using a framework combining music cognition, machine learning and signal processing. It will inquire scientifically into the nature of human perception of music, and offer a practical solution to difficult problems of machine intelligence for automatic musical content analysis and pattern discovery. Specifically, for segmentation, an HMM-based approach will be used for key change and chord change detection; and a method for detecting the self-similarity property using approximate pattern matching will be presented for recurrent structural analysis. For summarization, we will investigate the locations where the catchiest parts of a musical piece normally appear and develop strategies for automatically generating music thumbnails based on this analysis. For musical salience detection, we will examine methods for weighting the importance of musical segments based on the confidence of classification. Two classification techniques and their definitions of confidence will be explored. The effectiveness of all our methods will be demonstrated by quantitative evaluations and/or human experiments on complex real-world musical stimuli. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  David S. Watson,et al.  A Machine Learning Approach to Musical Style Recognition , 1997, ICMC.

[2]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[3]  Wei Chai,et al.  Structural analysis of musical signals via pattern matching , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Shlomo Dubnov,et al.  Robust temporal and spectral modeling for query By melody , 2002, SIGIR '02.

[5]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[6]  Barry Vercoe,et al.  Melody retrieval on the web , 2001, IS&T/SPIE Electronic Imaging.

[7]  Marvin Minsky Music, mind, and meaning , 1992 .

[8]  George Tzanetakis,et al.  Manipulation, analysis and retrieval systems for audio signals , 2002 .

[9]  Ching-Hua Chuan,et al.  Polyphonic Audio Key Finding Using the Spiral Array CEG Algorithm , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[10]  Mira Balaban,et al.  Understanding music with AI: perspectives on music cognition , 1992 .

[11]  Aaron E. Rosenberg,et al.  A comparative performance study of several pitch detection algorithms , 1976 .

[12]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[13]  François Pachet,et al.  FINDING SONGS THAT SOUND THE SAME , 2002 .

[14]  Gary Burns,et al.  A typology of ‘hooks’ in popular records , 1987, Popular Music.

[15]  Beth Logan,et al.  Music summarization using key phrases , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  Keith Dana Martin,et al.  Sound-source recognition: a theory and computational model , 1999 .

[17]  Hagen Soltau,et al.  Recognition of music types , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  Yuan Qi,et al.  Context-sensitive Bayesian classifiers and application to mouse pressure pattern classification , 2002, Object recognition supported by user interaction for service robots.

[19]  Shingo Uchihashi,et al.  The beat spectrum: a new approach to rhythm analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[20]  Jonathan Foote,et al.  Audio Retrieval by Rhythmic Similarity , 2002, ISMIR.

[21]  Jonathan Foote,et al.  An overview of audio information retrieval , 1999, Multimedia Systems.

[22]  David Pye,et al.  Content-based methods for the management of digital music , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[23]  Ian H. Witten,et al.  Towards the digital music library: tune retrieval from acoustic input , 1996, DL '96.

[24]  Masataka Goto,et al.  An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds , 2001 .

[25]  Daniel P. W. Ellis,et al.  Chord segmentation and recognition using EM-trained hidden markov models , 2003, ISMIR.

[26]  Anssi Klapuri,et al.  Robust Multipitch Estimation for the Analysis and Manipulation of Polyphonic Musical Signals , 2000 .

[27]  Adam Ockelford,et al.  On Similarity, Derivation and the Cognition of Musical Structure , 2004 .

[28]  Daniel P. W. Ellis,et al.  USING VOICE SEGMENTS TO IMPROVE ARTIST CLASSIFICATION OF MUSIC , 2002 .

[29]  Barry Vercoe,et al.  Folk Music Classification Using Hidden Markov Models , 2001 .

[30]  Jaap A. Haitsma,et al.  Robust Audio Hashing for Content Identification , 2001 .

[31]  Daniel P. W. Ellis,et al.  Locating singing voice segments within music signals , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[32]  Lelio Camilleri The music machine: Selected readings from “computer music journal” , 2004, Machine Translation.

[33]  George Tzanetakis,et al.  Music analysis and retrieval systems for audio signals , 2004, J. Assoc. Inf. Sci. Technol..

[34]  Youngmoo E. Kim,et al.  Singer Identification in Popular Music Recordings Using Voice Coding Features , 2002 .

[35]  François Pachet,et al.  The Cuidado music browser: an end-to-end electronic music distribution system , 2006, Multimedia Tools and Applications.

[36]  Robert A. Hall,et al.  Mind and Meaning , 1983 .

[37]  Lie Lu,et al.  Automatic mood detection from acoustic music data , 2003, ISMIR.

[38]  Steve Lawrence,et al.  Artist detection in music with Minnowmatch , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[39]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[40]  Irène Deliège Grouping Conditions in Listening to Music: An Approach to Lerdahl & Jackendoff's Grouping Preference Rules , 1987 .

[41]  Irène Deliège,et al.  Musical Schemata in Real-Time Listening to a Piece of Music , 1996 .

[42]  Mari Riess Jones,et al.  Does rule recursion make melodies easier to reproduce? If not, what does? , 1986, Cognitive Psychology.

[43]  R. Erickson Sound structure in music , 1975 .

[44]  Alexander H. Waibel,et al.  Strategies for automatic segmentation of audio data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[45]  Mark Sandler,et al.  Segmentation of Musical Signals Using Hidden Markov Models. , 2001 .

[46]  Jonathan Foote,et al.  Visualizing Musical Structure and Rhythm via Self-Similarity , 2001, ICMC.

[47]  Cheng Yang MACS: music audio characteristic sequence indexing for similarity retrieval , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[48]  R. Jackendoff,et al.  A Generative Theory of Tonal Music , 1985 .

[49]  Yuan Qi,et al.  Predictive automatic relevance determination by expectation propagation , 2004, ICML.

[50]  Roger B. Dannenberg,et al.  Pattern Discovery Techniques for Music Audio , 2002 .

[51]  Ahmet M. Kondoz,et al.  Multiple frequency harmonics analysis and synthesis of audio signals , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[52]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[53]  Arbee L. P. Chen,et al.  Discovering nontrivial repeating patterns in music data , 2001, IEEE Trans. Multim..

[54]  Barry Vercoe,et al.  Music thumbnailing via structural analysis , 2003, ACM Multimedia.

[55]  David Huron Perceptual and Cognitive Applications in Music Information Retrieval , 2000, ISMIR.

[56]  Barry Vercoe,et al.  Structural analysis of musical signals for indexing and thumbnailing , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[57]  Jean Laroche,et al.  Estimating tempo, swing and beat locations in audio recordings , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[58]  Curtis Roads,et al.  The Computer Music Tutorial , 1996 .

[59]  Xavier Rodet,et al.  Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.