Cascade processing for speeding up sliding window sparse classification

Sparse representations have been found to provide high classification accuracy in many fields. Their drawback is the high computational load. In this work, we propose a novel cascaded classifier structure to speed up the decision process while utilizing sparse signal representation. In particular, we apply the cascaded decision process for noise robust automatic speech recognition task. The cascaded decision process is implemented using a feedforward neural network (NN) and time sparse versions of a non-negative matrix factorization (NMF) based sparse classification method of [1]. The recognition accuracy of our cascade is among the three best in the recent CHiME2013 benchmark and obtains six times faster the accuracy of NMF alone as in [1].

[1]  Hui Jiang,et al.  Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[2]  Yi-Hsuan Yang,et al.  Multipitch Estimation of Piano Music by Exemplar-Based Sparse Representation , 2012, IEEE Transactions on Multimedia.

[3]  Hugo Van hamme,et al.  Noise Robust Exemplar Matching Using Sparse Representations of Speech , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Ke Huang,et al.  Sparse Representation for Signal Classification , 2006, NIPS.

[5]  Daoqiang Zhang,et al.  Ensemble sparse classification of Alzheimer's disease , 2012, NeuroImage.

[6]  B. Achiriloaie,et al.  VI REFERENCES , 1961 .

[7]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[8]  Jonathan Le Roux,et al.  Deep NMF for speech separation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Brendan J. Frey,et al.  Winner-Take-All Autoencoders , 2014, NIPS.

[10]  Kalle J. Palomäki,et al.  Estimating Uncertainty to Improve Exemplar-Based Feature Enhancement for Noise Robust Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  William P. Marnane,et al.  A novel dictionary for neonatal EEG seizure detection using atomic decomposition , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[13]  Song-Chun Zhu,et al.  Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection , 2015, 2013 IEEE International Conference on Computer Vision.

[14]  Björn W. Schuller,et al.  Memory-Enhanced Neural Networks and NMF for Robust ASR , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Jon Barker,et al.  The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[16]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  J. Friedman Fast sparse regression and classification , 2012 .

[18]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[19]  Brahim Chaib-draa,et al.  Incrementally Built Dictionary Learning for Sparse Representation , 2015, ICONIP.

[20]  Nicholas Costen,et al.  Sparse models for gender classification , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[21]  Björn Schuller,et al.  The TUM+TUT+KUL Approach to the 2nd CHiME Challenge: Multi-Stream ASR Exploiting BLSTM Networks and Sparse NMF , 2013, ICASSP 2013.

[22]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[23]  Bhiksha Raj,et al.  Active-set newton algorithm for non-negative sparse coding of audio , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Hermann Ney,et al.  On the Probabilistic Interpretation of Neural Network Classifiers and Discriminative Training Criteria , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Tuomas Virtanen,et al.  Modelling non-stationary noise with spectral factorisation in automatic speech recognition , 2013, Comput. Speech Lang..

[26]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.