Compositional models for audio processing

Many classes of data are composed as constructive combinations of parts. By ”constructive” combination, we mean additive combination that do not result in subtraction or diminishment of any of the parts. We will refer to such data as “compositional” data. Typical examples include population or counts data, where the total count of a population is obtained as the sum of counts of subpopulations. In order to characterize such data, a variety of mathematical models have been developed in the literature which, in conformance with the nature of the data, represent them as non-negative linear combinations of parts which themselves are also non-negative, to ensure that such combination does not result in subtraction or diminishment. We will refer to such models as “compositional” models.

[1]  Massimo Fornasier,et al.  Compressive Sensing , 2015, Handbook of Mathematical Methods in Imaging.

[2]  Bhiksha Raj,et al.  Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Nicolás Ruiz-Reyes,et al.  Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription , 2013, Eng. Appl. Artif. Intell..

[4]  Jonathan Le Roux,et al.  Consistent Wiener Filtering for Audio Source Separation , 2013, IEEE Signal Processing Letters.

[5]  Björn Schuller,et al.  The TUM+TUT+KUL approach to the CHiME challenge 2013: Multi-stream ASR exploiting BLSTM networks and sparse NMF , 2013 .

[6]  Haizhou Li,et al.  Exemplar-based voice conversion using non-negative spectrogram deconvolution , 2013, SSW.

[7]  Björn W. Schuller,et al.  Optimization and Parallelization of Monaural Source Separation Algorithms in the openBliSSART Toolkit , 2012, J. Signal Process. Syst..

[8]  Tetsuya Takiguchi,et al.  Exemplar-based voice conversion in noisy environment , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[9]  Paris Smaragdis,et al.  Optimal cost function and magnitude power for NMF-based speech separation and music interpolation , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[10]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Tara N. Sainath,et al.  Enhancing Exemplar-Based Posteriors for Speech Recognition Tasks , 2012, INTERSPEECH.

[12]  Rahim Saeidi,et al.  Group Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition , 2012, INTERSPEECH.

[13]  Louis ten Bosch,et al.  Using Sparse Classification Outputs as Feature Observations for Noise-robust ASR , 2012, INTERSPEECH.

[14]  Bhiksha Raj,et al.  Missing Data Imputation for Time-Frequency Representations of Audio Signals , 2011, J. Signal Process. Syst..

[15]  Haesun Park,et al.  Fast Nonnegative Matrix Factorization: An Active-Set-Like Method and Comparisons , 2011, SIAM J. Sci. Comput..

[16]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Tuomas Virtanen,et al.  Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization , 2011, IEEE Journal of Selected Topics in Signal Processing.

[18]  Gaël Richard,et al.  A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[19]  Francis Bach,et al.  Itakura-Saito nonnegative matrix factorization with group sparsity , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Hirokazu Kameoka,et al.  Formulations and algorithms for multichannel complex NMF , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Hirokazu Kameoka,et al.  I-Divergence-based dereverberation method with auxiliary function approach , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Hirokazu Kameoka,et al.  Computational auditory induction as a missing-data model-fitting problem with Bregman divergence , 2011, Speech Commun..

[23]  Tuomas Virtanen,et al.  Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition , 2011, INTERSPEECH.

[24]  Björn Schuller,et al.  The Munich 2011 CHiME Challenge Contribution: NMF-BLSTM Speech Enhancement and Recognition for Reverberated Multisource Environments , 2011, Interspeech 2011.

[25]  Jérôme Idier,et al.  Algorithms for nonnegative matrix factorization with the beta-divergence , 2010, ArXiv.

[26]  Bhiksha Raj,et al.  Non-negative Hidden Markov Modeling of Audio with Application to Source Separation , 2010, LVA/ICA.

[27]  Rémi Gribonval,et al.  Sparse Representations in Audio and Music: From Coding to Source Separation , 2010, Proceedings of the IEEE.

[28]  Tuomas Virtanen,et al.  Object-Based Audio Coding Using Non-Negative Matrix Factorization for the Spectrogram Representation , 2010 .

[29]  Tara N. Sainath,et al.  Bayesian compressive sensing for phonetic classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Bhiksha Raj,et al.  Latent-variable decomposition based dereverberation of monaural and multi-channel signals , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Hugo Van hamme,et al.  Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition , 2010, IEEE Journal of Selected Topics in Signal Processing.

[34]  Bhiksha Raj,et al.  A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds , 2009, NIPS.

[35]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[36]  Tuomas Virtanen,et al.  Spectral covariance in prior distributions of non-negative matrix factorization based speech separation , 2009, 2009 17th European Signal Processing Conference.

[37]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[38]  Ole Winther,et al.  Bayesian Non-negative Matrix Factorization , 2009, ICA.

[39]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[40]  Anssi Klapuri,et al.  Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation , 2009, ISMIR.

[41]  Louis ten Bosch,et al.  Using sparse representations for exemplar based continuous digit recognition , 2009, 2009 17th European Signal Processing Conference.

[42]  Derry Fitzgerald,et al.  Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation , 2008, Comput. Intell. Neurosci..

[43]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[44]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[45]  Mark D. Plumbley,et al.  Theorems on Positive Data: On the Uniqueness of NMF , 2008, Comput. Intell. Neurosci..

[46]  Bhiksha Raj,et al.  Sparse Overcomplete Latent Variable Decomposition of Counts Data , 2007, NIPS.

[47]  Andrzej Cichocki,et al.  Nonnegative matrix factorization with constrained second-order optimization , 2007, Signal Process..

[48]  Bhiksha Raj,et al.  Bandwidth Expansionwith a pólya URN Model , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[49]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[50]  Paris Smaragdis,et al.  Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  Bhiksha Raj,et al.  Bandwidth expansion of narrowband speech using non-negative matrix factorization , 2005, INTERSPEECH.

[52]  Michael Elad,et al.  K-SVD and its non-negative variant for dictionary design , 2005, SPIE Optics + Photonics.

[53]  Seungjin Choi,et al.  Nonnegative features of spectro-temporal sounds for classification , 2005, Pattern Recognit. Lett..

[54]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[55]  J. Eggert,et al.  Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[56]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[57]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[58]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[59]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[60]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .