Pitch Detection in Polyphonic Music using Instrument Tone Models

We propose a hidden Markov model (HMM) based system to detect the pitch of an instrument in polyphonic music using an instrument tone model. Our system calculates at every time frame the salience of a pitch hypothesis based on the magnitudes of harmonics associated with the hypothesis. A hypothesis selection method is introduced to choose pitch hypotheses with sufficiently high salience as pitch candidates. Then the system applies an instrument model to evaluate the likelihood of each candidate. The transition probability between successive pitch points is constructed using the prior knowledge of the musical key of the input. Finally an HMM integrates the instrument likelihood and the pitch transition probability. Quantitative evaluation shows the proposed system performs well for different instruments. We also compare a Gaussian mixture model and kernel density estimation for instrument modeling, and find that kernel density estimation gives better overall performance while the Gaussian mixture model is more robust.

[1]  Guy J. Brown,et al.  Instrument recognition in accompanied sonatas and concertos , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Ye Wang,et al.  Key determination of acoustic musical signals , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[4]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[5]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[6]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[7]  Justyna Humięcka-Jakubowska,et al.  Sweet Anticipation : Music and , 2006 .

[8]  DeLiang Wang,et al.  Separation of singing voice from music accompaniment for monaural recordings , 2007 .

[9]  Guy J. Brown,et al.  Extracting Melody Lines From Complex Audio , 2004, ISMIR.

[10]  Gaël Richard,et al.  Instrument recognition in polyphonic music , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[12]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[13]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[14]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[15]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[16]  Xavier Rodet,et al.  Instrument identification in solo and ensemble music using Independent Subspace Analysis , 2004, ISMIR.