ON FINDING MELODIC LINES IN AUDIO RECORDINGS

The paper presents our approach to the problem of finding melodic line(s) in polyphonic audio recordings. The approach is composed of two different stages, partially rooted in psychoacoustic theories of music perception: the first stage is dedicated to finding regions with strong and stable pitch (melodic fragments), while in the second stage, these fragments are grouped according to their properties (pitch, loudness...) into clusters which represent melodic lines of the piece. Expectation Maximization algorithm is used in both stages to find the dominant pitch in a region, and to train Gaussian Mixture Models that group fragments into melodies. The paper presents the entire process in more detail and provides some initial results.

[1]  Julius O. Smith,et al.  Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .

[2]  Xavier Serra,et al.  A sound analysis/synthesis system based on a deterministic plus stochastic decomposition , 1990 .

[3]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[4]  Masataka Goto,et al.  A Real-time Music Scene Description System: Detecting Melody and Bass Lines in Audio Signals , 1999 .

[5]  Daniel J. Levitin,et al.  Memory for musical attributes , 1999 .

[6]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[7]  William M. Hartmann,et al.  Psychoacoustics: Facts and Models , 2001 .

[8]  Jarno Sepp nen TATUM GRID ANALYSIS OF MUSICAL SIGNALS , 2001 .

[9]  Simon Dixon,et al.  Automatic Extraction of Tempo and Beat From Expressive Performances , 2001 .

[10]  Masataka Goto A predominant-F/sub 0/ estimation method for CD recordings: MAP estimation using EM algorithm for adaptive tone models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Daniel P. W. Ellis,et al.  Chord segmentation and recognition using EM-trained hidden markov models , 2003, ISMIR.

[12]  Anssi Klapuri,et al.  AUTOMATIC TRANSCRIPTION OF MUSIC , 2003 .

[13]  Marc Leman,et al.  An auditory model based transriber of vocal queries , 2003, ISMIR.

[14]  Thorsten Heinz,et al.  Using a Physiological Ear Model for Automatic Melody Transcription and Sound Source Recognition , 2003 .

[15]  D. Weinshall,et al.  Computing Gaussian Mixture Models with EM using Side-Information , 2003 .

[16]  Juan Pablo,et al.  Towards the automated analysis of simple polyphonic music : a knowledge-based approach , 2003 .

[17]  Tong Zhang,et al.  Automatic singer identification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[18]  François Pachet,et al.  Representing Musical Genre: A State of the Art , 2003 .

[19]  Matija Marolt Networks of Adaptive Oscillators for Partial Tracking and Transcription of Music Recordings , 2004 .