Multidimensional humming transcription using a statistical approach for query by humming systems

A new statistical pattern recognition approach applied to human humming transcription is proposed. A musical note has two important attributes, i.e. pitch and duration. The proposed algorithm generates multidimensional humming transcriptions, which contain both pitch and duration information. Query by humming provides a natural means for content-based retrieval from music databases, and this research provides a robust frontend for such an application. The segment of a note in the humming waveform is modeled by a hidden Markov model (HMM), while the pitch of the note is modeled by a pitch model using a Gaussian mixture model. Preliminary real-time recognition experiments are carried out with models trained by data obtained from eight human subjects, and an overall correct recognition rate of around 80% is demonstrated.

[1]  Adriane Durey,et al.  Melody Spotting Using Hidden Markov Models , 2001, ISMIR.

[2]  Ming-Yang Kao,et al.  Content-based music retrieval using linear scaling and branch-and-bound tree search , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[3]  Christopher Raphael,et al.  Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  B Gold,et al.  Parallel processing techniques for estimating pitch periods of speech in the time domain. , 1969, The Journal of the Acoustical Society of America.

[5]  Ian H. Witten,et al.  Towards the digital music library: tune retrieval from acoustic input , 1996, DL '96.

[6]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[7]  C.-C. Jay Kuo,et al.  An HMM-based approach to humming transcription , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[8]  Ian H. Witten,et al.  Signal processing for melody transcription , 1995 .

[9]  Mark A. Clements,et al.  Features for melody spotting using hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.