论文信息 - Non-negative Hidden Markov Modeling of Audio with Application to Source Separation

Non-negative Hidden Markov Modeling of Audio with Application to Source Separation

In recent years, there has been a great deal of work in modeling audio using non-negative matrix factorization and its probabilistic counterparts as they yield rich models that are very useful for source separation and automatic music transcription. Given a sound source, these algorithms learn a dictionary of spectral vectors to best explain it. This dictionary is however learned in a manner that disregards a very important aspect of sound, its temporal structure. We propose a novel algorithm, the non-negative hidden Markov model (N-HMM), that extends the aforementioned models by jointly learning several small spectral dictionaries as well as a Markov chain that describes the structure of changes between these dictionaries. We also extend this algorithm to the non-negative factorial hidden Markov model (N-FHMM) to model sound mixtures, and demonstrate that it yields superior performance in single channel source separation tasks.

[1] Maurice Charbit,et al. Factorial Scaled Hidden Markov Model for polyphonic audio representation and source separation , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[2] Bhiksha Raj,et al. A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[3] Tuomas Virtanen,et al. Speech recognition using factorial hidden Markov models for separation in the feature space , 2006, INTERSPEECH.

[4] John R. Hershey,et al. Single Channel Speech Separation Using Factorial Dynamics , 2006, NIPS.

[5] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.

[8] P. Smaragdis,et al. Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[9] Rémi Gribonval,et al. Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.