An Energy-Efficient Speech-Extraction Processor for Robust User Speech Recognition in Mobile Head-Mounted Display Systems

An energy-efficient speech extraction (SE) processor is proposed for robust user speech recognition (SR) in head-mounted display (HMD) systems. User SE is essential for robust user SR in a noisy environment. For the low-latency SE, the FastSE algorithm is proposed to overcome the time-consuming constrained-independent-component-analysis-based user speech selection process, which results in < 2-ms SE latency. Moreover, a reinforced-FastSE scheme is proposed to achieve 97.2% accuracy with only 33-kB FastSE on-chip memory for the low-power HMD applications. Also, a reconfigurable matrix operation accelerator is implemented for the energy-efficient acceleration of the dominant matrix operation in SE. As a result, the proposed SE processor achieves 1.3× higher speed with 4.24× smaller memory compared to the state-of-the-art work, so SR in a noisy environment becomes possible for mobile HMD applications.

[1]  Tzyy-Ping Jung,et al.  An efficient VLSI implementation of on-line recursive ICA processor for real-time multi-channel EEG signal separation , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[2]  Sang Yup Lee,et al.  Extracting a source of shorter source-to-microphone distance from convolutive mixtures , 2011 .

[3]  Hoi-Jun Yoo,et al.  A 3.13nJ/sample energy-efficient speech extraction processor for robust speech recognition in mobile head-mounted display systems , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[4]  Young-Koo Lee,et al.  Fast constrained independent component analysis for blind speech separation with multiple references , 2010, 5th International Conference on Computer Sciences and Convergence Information Technology.

[5]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[6]  Po-Lei Lee,et al.  Implementation of Pipelined FastICA on FPGA for Real-Time Blind Source Separation , 2008, IEEE Transactions on Neural Networks.

[7]  Hoi-Jun Yoo,et al.  Wearable mental-health monitoring platform with independent component analysis and nonlinear chaotic analysis , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  K. Matsuoka,et al.  Minimal distortion principle for blind source separation , 2002, Proceedings of the 41st SICE Annual Conference. SICE 2002..

[10]  Chia-Hsiang Yang,et al.  An 81.6 $\mu {\rm W}$ FastICA Processor for Epileptic Seizure Detection , 2015, IEEE Transactions on Biomedical Circuits and Systems.

[11]  John R. Hershey,et al.  Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system , 2006, INTERSPEECH.

[12]  Francesco Nesta,et al.  A FLEXIBLE SPATIAL BLIND SOURCE EXTRACTION FRAMEWORK FOR ROBUST SPEECH RECOGNITION IN NOISY ENVIRONMENTS , 2013 .

[13]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[14]  Lan-Da Van,et al.  Energy-Efficient FastICA Implementation for Biomedical Signal Separation , 2011, IEEE Transactions on Neural Networks.