Music Scene-Adaptive Harmonic Dictionary for Unsupervised Note-Event Detection

Harmonic decompositions are a powerful tool dealing with polyphonic music signals in some potential applications such as music visualization, music transcription and instrument recognition. The usefulness of a harmonic decomposition relies on the design of a proper harmonic dictionary. Music scene-adaptive harmonic atoms have been used with this purpose. These atoms are adapted to the musical instruments and to the music scene, including aspects related with the venue, musician, and other relevant acoustic properties. In this paper, an unsupervised process to obtain music scene-adaptive spectral patterns for each MIDI-note is proposed. Furthermore, the obtained harmonic dictionary is applied to note-event detection with matching pursuits. In the case of a music database that only consists of one-instrument signals, promising results (high accuracy and low error rate) have been achieved for note-event detection.

[1]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[2]  Soledad Torres-Guijarro,et al.  Multiple Piano Note Identification Using a Spectral Matching Method with Derived Patterns , 2005 .

[3]  Masataka Goto,et al.  Development of the RWC Music Database , 2004 .

[4]  Manuel Rosa-Zurera,et al.  Transient modeling by matching pursuits with a wavelet dictionary for parametric audio coding , 2004, IEEE Signal Processing Letters.

[5]  Simon Dixon,et al.  On the Computer Recognition of Solo Piano Music , 2000 .

[6]  Nicolás Ruiz-Reyes,et al.  A Joint Approach to Extract Multiple Fundamental Frequency in Polyphonic Signals Minimizing Gaussian Spectral Distance , 2009 .

[7]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[8]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[9]  Teresa H. Y. Meng,et al.  Sinusoidal modeling using frame-based perceptually weighted matching pursuits , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Ye Wang,et al.  Music transcription using an instrument model , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Roland Badeau,et al.  Blind Signal Decompositions for Automatic Transcription of Polyphonic Music: NMF and K-SVD on the Benchmark , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  J. A. Conklin Generation of partials due to nonlinear mixing in a stringed instrument , 1999 .

[13]  A. Willsky,et al.  HIGH RESOLUTION PURSUIT FOR FEATURE EXTRACTION , 1998 .

[14]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[15]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[16]  Emmanuel Vincent,et al.  Instrument-Specific Harmonic Atoms for Mid-Level Music Representation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[19]  Mark B. Sandler,et al.  Automatic Piano Transcription Using Frequency and Time-Domain Information , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[21]  Hirokazu Kameoka,et al.  Specmurt Analysis of Polyphonic Music Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Laurent Daudet,et al.  Sparse and structured decompositions of signals with the molecular matching pursuit , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[26]  Michael M. Goodwin,et al.  Matching pursuit with damped sinusoids , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Rémi Gribonval,et al.  Harmonic decomposition of audio signals with matching pursuit , 2003, IEEE Trans. Signal Process..

[28]  José Manuel Iñesta Quereda,et al.  Multiple fundamental frequency estimation using Gaussian smoothness , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  J. J. Carabias-Orti,et al.  Note-event Detection in Polyphonic Musical Signals based on Harmonic Matching Pursuit and Spectral Smoothness , 2008 .

[30]  Matija Marolt,et al.  A connectionist approach to automatic transcription of polyphonic piano music , 2004, IEEE Transactions on Multimedia.

[31]  Masataka Goto A predominant-F/sub 0/ estimation method for CD recordings: MAP estimation using EM algorithm for adaptive tone models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[32]  Nicolás Ruiz-Reyes,et al.  New algorithm based on spectral distance maximization to deal with the overlapping partial problem in note-event detection , 2009, Signal Process..

[33]  Ian Witten,et al.  Data Mining , 2000 .