论文信息 - Live Drum Separation Using Probabilistic Spectral Clustering Based on the Itakura-Saito Divergence

Live Drum Separation Using Probabilistic Spectral Clustering Based on the Itakura-Saito Divergence

We present a live drum separation system for a specific target drumset to be used as a front end in a complete live drum understanding system. Our system decomposes drum note onsets onto spectral drum templates by adapting techniques from non-negative matrix factorization. Multiple templates per drum are computed using a new Gamma mixture model clustering procedure to account for the variety of sounds that can be produced by a single drum. This clustering procedure imposes an Itakura-Saito distance metric on the cluster space. In addition, we utilize “tail” templates for each drum which greatly improve the separation accuracy when cymbals with long decay times are present.

[1] Guido van Rossum,et al. Python Programming Language , 2007, USENIX Annual Technical Conference.

[2] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[3] Jeff A. Bilmes,et al. A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[4] Gaël Richard,et al. Automatic transcription of drum loops , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Charles A. Bouman,et al. CLUSTER: An Unsupervised Algorithm for Modeling Gaussian Mixtures , 2014 .

[6] Mark D. Plumbley,et al. Automatic Music Transcription and Audio Source Separation , 2002, Cybern. Syst..

[7] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[8] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[9] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10] Masataka Goto,et al. Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With Harmonic Structure Suppression , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11] Andrzej Cichocki,et al. Csiszár's Divergences for Non-negative Matrix Factorization: Family of New Algorithms , 2006, ICA.

[12] Jouni Paulus,et al. Drum transcription with non-negative spectrogram factorisation , 2005, 2005 13th European Signal Processing Conference.

[13] Mark B. Sandler,et al. A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[14] Roland Badeau,et al. Beta-Divergence as a Subclass of Bregman Divergence , 2011, IEEE Signal Processing Letters.

[15] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[16] Nancy Bertin,et al. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[17] Roland Badeau,et al. A tempering approach for Itakura-Saito non-negative matrix factorization. With application to music transcription , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18] Douglas A. Reynolds,et al. Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[19] Anssi Klapuri,et al. Sound onset detection by applying psychoacoustic knowledge , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[20] Fabien Gouyon,et al. Exploration of techniques for automatic labeling of audio drum tracks instruments , 2001 .