A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals

In this paper I introduce a method, called PreFEst, for estimating the fundamental frequency (F0) of simultaneous sounds in monaural polyphonic audio signals. Most previous F0-estimation methods have had difficulty dealing with such complex audio signals because these methods were designed to deal with mixtures of only a few sounds. Without assuming the number of sound sources, PreFEst can estimate the relative dominance of every possible harmonic structure in the input mixture. It treats the mixture as if it contains all possible harmonic structures with different weights, and estimates their weights by MAP estimation. PreFEst can obtain the melody and bass lines by regarding the most predominant F0 in middleand highfrequency regions as the melody line and the one in a lowfrequency region as the bass line. Experimental results with compact-disc recordings showed that a real-time system implementing this method was able to detect melody and bass lines about 80% of the time these existed.

[1]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[2]  Masataka Goto A predominant-F/sub 0/ estimation method for CD recordings: MAP estimation using EM algorithm for adaptive tone models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Masataka Goto,et al.  A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Masataka Goto,et al.  A Real-time Music Scene Description System: Detecting Melody and Bass Lines in Audio Signals , 1999 .

[6]  Anssi Klapuri,et al.  Multipitch estimation and sound separation by the spectral smoothness principle , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Alain de Cheveigné,et al.  Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancell , 1993 .

[8]  Hideki Kawahara,et al.  Multiple period estimation and pitch perception model , 1999, Speech Commun..

[9]  Ikuyo Masuda-Katsuse A new method for speech recognition in the presence of non-stationary, unpredictable and high-level noise , 2001, INTERSPEECH.

[10]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..