Polyphonic transcription based on temporal evolution of spectral similarity of gaussian mixture models

This paper describes a system to transcribe multitimbral polyphonic music based on a joint multiple-F0 estimation. In a frame level, all possible fundamental frequency (F0) candidates are selected. Using a competitive strategy, a spectral envelope is estimated for each combination composed of F0 candidates under assumption that a polyphonic sound can be modeled by a sum of weighted gaussian mixture models (GMM). Since in polyphonic music the current spectral content depends to a large extent of the immediately previous one, the winner combination is determined taking into account the highest spectral similarity regarding to the past music events which has been selected from a set of combinations that minimize the current spectral distance between input-GMM spectrums. Our system was tested using several pieces of real-world music recordings from RWC Music Database. Evaluation shows encouraging results compared to a recent state-of-the-art method.

[1]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[2]  Anssi Klapuri,et al.  Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Hirokazu Kameoka,et al.  A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  J. J. Carabias-Orti,et al.  Note-event Detection in Polyphonic Musical Signals based on Harmonic Matching Pursuit and Spectral Smoothness , 2008 .

[5]  Miguel A. Alonso,et al.  Extracting note onsets from musical recordings , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[6]  José Manuel Iñesta Quereda,et al.  Multiple fundamental frequency estimation using Gaussian smoothness , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[8]  Axel Röbel,et al.  Multiple fundamental frequency estimation of polyphonic music signals , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Hirokazu Kameoka,et al.  Specmurt Analysis of Polyphonic Music Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Mark B. Sandler,et al.  Automatic Piano Transcription Using Frequency and Time-Domain Information , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[12]  Mark Sandler,et al.  AUTOMATIC POLYPHONIC PIANO NOTE EXTRACTION USING FUZZY LOGIC IN A BLACKBOARD SYSTEM , 2002 .

[13]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[14]  DeLiang Wang,et al.  Pitch Detection in Polyphonic Music using Instrument Tone Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.