Multiple Fundamental Frequency Estimation and Polyphony Inference of Polyphonic Music Signals

This paper presents a frame-based system for estimating multiple fundamental frequencies (F0s) of polyphonic music signals based on the short-time Fourier transform (STFT) representation. To estimate the number of sources along with their F0s, it is proposed to estimate the noise level beforehand and then jointly evaluate all the possible combinations among pre-selected F0 candidates. Given a set of F0 hypotheses, their hypothetical partial sequences are derived, taking into account where partial overlap may occur. A score function is used to select the plausible sets of F0 hypotheses. To infer the best combination, hypothetical sources are progressively combined and iteratively verified. A hypothetical source is considered valid if it either explains more energy than the noise, or improves significantly the envelope smoothness once the overlapping partials are treated. The proposed system has been submitted to Music Information Retrieval Evaluation eXchange (MIREX) 2007 and 2008 contests where the accuracy has been evaluated with respect to the number of sources inferred and the precision of the F0s estimated. The encouraging results demonstrate its competitive performance among the state-of-the-art methods.

[1]  A. Oppenheim Speech analysis-synthesis system based on homomorphic filtering. , 1969, The Journal of the Acoustical Society of America.

[2]  T. V. Sreenivas,et al.  Functional demarcation of pitch , 1981 .

[3]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[4]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[5]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[6]  Julius O. Smith,et al.  Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .

[7]  Xavier Serra,et al.  A sound analysis/synthesis system based on a deterministic plus stochastic decomposition , 1990 .

[8]  Robert C. Maher,et al.  Evaluation of a method for separating digitized duet signals , 1990 .

[9]  S. Schwerman,et al.  The Physics of Musical Instruments , 1991 .

[10]  Hans-Paul Schwefel,et al.  Evolution and Optimum Seeking: The Sixth Generation , 1993 .

[11]  A. W. Kemp,et al.  Kendall's Advanced Theory of Statistics. , 1994 .

[12]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[13]  Hans-Paul Schwefel,et al.  Evolution and optimum seeking , 1995, Sixth-generation computer technology series.

[14]  Y. Qi,et al.  Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. , 1997, The Journal of the Acoustical Society of America.

[15]  Simon J. Godsill,et al.  Polyphonic pitch tracking using joint Bayesian estimation of multiple frame parameters , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[16]  J. A. Conklin Generation of partials due to nonlinear mixing in a stringed instrument , 1999 .

[17]  A. Bregman Auditory Scene Analysis , 2001 .

[18]  Nadine Martin,et al.  Spectrogram segmentation by means of statistical features for non-stationary signal interpretation , 2002, IEEE Trans. Signal Process..

[19]  Sam T. Roweis,et al.  Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.

[20]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Axel R¨obel A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER , 2003 .

[22]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[23]  Khaled H. Hamed,et al.  Time-frequency analysis , 2003 .

[24]  Alain de Cheveigné,et al.  Pitch-Tracking of Reverberant Sounds, Application to Spatial Description of Sound Scenes , 2003 .

[25]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[26]  Axel Röbel,et al.  Signal decomposition by means of classification of spectral peaks , 2004, ICMC.

[27]  Axel Röbel,et al.  Multiple fundamental frequency estimation of polyphonic music signals , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[28]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[29]  Axel Röbel,et al.  MULTIPLE F0 TRACKING IN SOLO RECORDINGS OF MONODIC INSTRUMENTS , 2006 .

[30]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[31]  M. Davy,et al.  Bayesian analysis of polyphonic western tonal music. , 2006, The Journal of the Acoustical Society of America.

[32]  Arshia Cont Realtime Multiple Pitch Observation using Sparse Non-negative Constraints , 2006, ISMIR.

[33]  A. Röbel,et al.  Adaptive noise level estimation , 2006 .

[34]  Ruohua Zhou,et al.  Feature extraction of musical content for automatic music transcription , 2006 .

[35]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[36]  Hirokazu Kameoka,et al.  A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  Niels Bogaards,et al.  Synthesized Polyphonic Music Database with Verifiable Ground Truth for Multiple F0 Estimation , 2007, ISMIR.

[38]  Christian Jutten,et al.  Log-Rayleigh Distribution: A Simple and Efficient Statistical Representation of Log-Spectral Coefficients , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Shigeki Sagayama,et al.  Multipitch Analysis with Harmonic Nonnegative Matrix Approximation , 2007, ISMIR.

[40]  Anssi Klapuri,et al.  Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Xavier Rodet,et al.  MULTIPLE-F0 TRACKING BASED ON A HIGH-ORDER HMM MODEL , 2008 .

[42]  Axel Röbel,et al.  Adaptive Threshold Determination for Spectral Peak Classification , 2008, Computer Music Journal.

[43]  Emmanuel Vincent,et al.  Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Roland Badeau,et al.  Automatic transcription of piano music based on HMM tracking of jointly-estimated pitches , 2008, 2008 16th European Signal Processing Conference.

[45]  José Manuel Iñesta Quereda,et al.  Multiple fundamental frequency estimation using Gaussian smoothness , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  Emmanuel Vincent,et al.  Instrument-Specific Harmonic Atoms for Mid-Level Music Representation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  Hirokazu Kameoka,et al.  Specmurt Analysis of Polyphonic Music Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[48]  Axel Röbel,et al.  The expected amplitude of overlapping partials of harmonic sounds , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[49]  Chunghsin Yeh,et al.  MULTIPLE-F0 ESTIMATION FOR MIREX 2011 , 2011 .