A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals

Abstract In this paper, we describe the concept of music scene description and address the problem of detecting melody and bass lines in real-world audio signals containing the sounds of various instruments. Most previous pitch-estimation methods have had difficulty dealing with such complex music signals because these methods were designed to deal with mixtures of only a few sounds. To enable estimation of the fundamental frequency (F0) of the melody and bass lines, we propose a predominant-F0 estimation method called PreFEst that does not rely on the unreliable fundamental component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. This method estimates the relative dominance of every possible F0 (represented as a probability density function of the F0) by using MAP (maximum a posteriori probability) estimation and considers the F0’s temporal continuity by using a multiple-agent architecture. Experimental results with a set of ten music excerpts from compact-disc recordings showed that a real-time system implementing this method was able to detect melody and bass lines about 80% of the time these existed.

[1]  Tomohiro Nakatani,et al.  Residue-Driven Architecture for Computational Auditory Scene Analysis , 1995, IJCAI.

[2]  T. Abe,et al.  The IF Spectrogram : A New Spectral Representation , 1997 .

[3]  Roy D. Patterson,et al.  Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity , 1999, EUROSPEECH.

[4]  Seiji Inokuchi,et al.  The Kansei Music System , 1989 .

[5]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[6]  Guy J. Brown Computational auditory scene analysis : a representational approach , 1993 .

[7]  Kunio Kashino,et al.  Application of the Bayesian probability network to music scene analysis , 1998 .

[8]  Guy J. Brown,et al.  Perceptual Grouping of Musical Sounds : A Computational Model , 1994 .

[9]  Ikuyo Masuda-Katsuse,et al.  Speech Estimation Biased by Phonemic Expectation in the Presence of Non-stationary and Unpredictable Noise , 2001 .

[10]  Q. Summerfield Book Review: Auditory Scene Analysis: The Perceptual Organization of Sound , 1992 .

[11]  Francis Charpentier,et al.  Pitch detection using the short-term phase spectrum , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. II. A/lgorithms and applications , 1992, Proc. IEEE.

[13]  Yoichi Muraoka,et al.  RMCP: Remote Music Control Protocol - Design and Applications , 1997, ICMC.

[14]  R. Ritsma Frequencies dominant in the perception of the pitch of complex sounds. , 1966, The Journal of the Acoustical Society of America.

[15]  M. Schroeder Period histogram and product spectrum: new methods for fundamental-frequency measurement. , 1968, The Journal of the Acoustical Society of America.

[16]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[17]  Yoichi Muraoka,et al.  Musical understanding at the beat level: real-time beat tracking for audio signals , 1998 .

[18]  Arye Nehorai,et al.  Adaptive comb filtering for harmonic signal enhancement , 1986, IEEE Trans. Acoust. Speech Signal Process..

[19]  Hiroshi Ohmura Fine pitch contour extraction by voice fundamental wave filtering method , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Kunio Kashino,et al.  A Music Stream Segregation System Based on Adaptive Multi-Agents , 1997, IJCAI.

[21]  Chris Chafe,et al.  Source separation and note identification in polyphonic music , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Yoichi Muraoka,et al.  Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions , 1999, Speech Commun..

[23]  Guy J. Brown,et al.  Computational auditory scene analysis: Exploiting principles of perceived continuity , 1993, Speech Commun..

[24]  R. Plomp Pitch of complex tones. , 1966, The Journal of the Acoustical Society of America.

[25]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[26]  Kunio Kashino,et al.  A computational model of auditory segregation of two frequency components—evaluation and integration of multiple cues , 1994 .

[27]  Yoichi Muraoka,et al.  A beat tracking system for acoustic signals of music , 1994, MULTIMEDIA '94.

[28]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[29]  Anssi Klapuri,et al.  Multipitch estimation and sound separation by the spectral smoothness principle , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[30]  Alain de Cheveigné,et al.  Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancell , 1993 .

[31]  Hideki Kawahara,et al.  Multiple period estimation and pitch perception model , 1999, Speech Commun..

[32]  Yoichi Muraoka,et al.  Beat Tracking based on Multiple-agent Architecture A Real-time Beat Tracking System for Audio Signals w , 1996 .

[33]  Aaron E. Rosenberg,et al.  A comparative performance study of several pitch detection algorithms , 1976 .

[34]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals , 1992, Proc. IEEE.

[35]  Hiroaki Kitano,et al.  Challenges of massive parallelism , 1993, IJCAI 1993.

[36]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[37]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[38]  Takao Kobayashi,et al.  Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous frequency , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[39]  Masataka Goto,et al.  An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds , 2001 .

[40]  Ikuyo Masuda-Katsuse A new method for speech recognition in the presence of non-stationary, unpredictable and high-level noise , 2001, INTERSPEECH.

[41]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..