A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings

This paper describes a robust method for estimating the fundamental frequency (F0) of melody and bass lines in monaural real-world musical audio signals containing sounds of various instruments. Most previous F0-estimation methods had great difficulty dealing with such complex audio signals because they were designed to deal with mixtures of only a few sounds. To make it possible to estimate the F0 of the melody and bass lines, we propose a predominant-F0 estimation method called PreFEst that does not rely on the F0's unreliable frequency component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. It evaluates the relative dominance of every possible F0 by using the expectation-maximization algorithm and considers the temporal continuity of F0s by using a multiple-agent architecture. Experimental results show that our real-time system can detect the melody and bass lines in audio signals sampled from commercially distributed compact discs.

[1]  T. Abe,et al.  The IF Spectrogram : A New Spectral Representation , 1997 .

[2]  Seiji Inokuchi,et al.  The Kansei Music System , 1989 .

[3]  Takao Kobayashi,et al.  Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous frequency , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Roy D. Patterson,et al.  Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity , 1999, EUROSPEECH.

[5]  Music recognition using note transition context , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Chris Chafe,et al.  Source separation and note identification in polyphonic music , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Guy J. Brown,et al.  Perceptual Grouping of Musical Sounds : A Computational Model , 1994 .

[8]  Arye Nehorai,et al.  Adaptive comb filtering for harmonic signal enhancement , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  M. J. Cheng,et al.  Comparative performance study of several pitch detection algorithms , 1975 .

[10]  Francis Charpentier,et al.  Pitch detection using the short-term phase spectrum , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .