Use Brain-Like Audio Features to Improve Speech Recognition Performance

Speech recognition plays an important role in the field of human-computer interaction through the use of acoustic sensors, but speech recognition is technically difficult, has complex overall logic, relies heavily on neural network algorithms, and has extremely high technical requirements. In speech recognition, feature extraction is the first step in speech recognition for recovering and extracting speech features. Existing methods, such as Meier spectral coefficients (MFCC) and spectrograms, lose a large amount of acoustic information and lack biological interpretability. Then, for example, existing speech self-supervised representation learning methods based on contrast prediction need to construct a large number of negative samples during training, and their learning effects depend on large batches of training, which requires a large amount of computational resources for the problem. Therefore, in this paper, we propose a new feature extraction method, called SHH (spike-H), that resembles the human brain and achieves higher speech recognition rates than previous methods. The features extracted using the proposed model are subsequently fed into the classification model. We propose a novel parallel CRNN model with an attention mechanism that considers both temporal and spatial features. Experimental results show that the proposed CRNN achieves an accuracy of 94.8% on the Aurora dataset. In addition, audio similarity experiments show that SHH can better distinguish audio features. In addition, the ablation experiments show that SHH is applicable to digital speech recognition.

[1]  S. Jeon,et al.  End-to-End Sentence-Level Multi-View Lipreading Architecture with Spatial Attention Module Integrated Multiple CNNs and Cascaded Local Self-Attention-CTC , 2022, Sensors.

[2]  Yi Wang,et al.  Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy , 2022, Sensors.

[3]  J. Marco Algarra,et al.  Power output and hearing performance in osseointegrated auditory devices. , 2022, Acta otorrinolaringologica espanola.

[4]  S. van de Par,et al.  A Speech Preprocessing Method Based on Perceptually Optimized Envelope Processing to Increase Intelligibility in Reverberant Environments , 2021, Applied Sciences.

[5]  Vineel Pratap,et al.  Performance Evaluation of Offline Speech Recognition on Edge Devices , 2021, Electronics.

[6]  Md. Rakibul Hasan,et al.  How many Mel‐frequency cepstral coefficients to be utilized in speech recognition? A study with the Bengali language , 2021, The Journal of Engineering.

[7]  Luca Fanucci,et al.  Optimising Speaker-Dependent Feature Extraction Parameters to Improve Automatic Speech Recognition Performance for People with Dysarthria , 2021, Sensors.

[8]  David Whitney,et al.  Spatial and feature tuning of serial dependence in audiovisual timing perception , 2021, Journal of Vision.

[9]  Issam Atouf,et al.  Performance evaluation and implementations of MFCC, SVM and MLP algorithms in the FPGA board , 2021, International journal of electrical and computer engineering systems.

[10]  Datta Rakshith KS,et al.  Comparative Performance Analysis for Speech Digit Recognition based on MFCC and Vector Quantization , 2021, Global Transitions Proceedings.

[11]  Horia Cucu,et al.  Performance vs. hardware requirements in state-of-the-art automatic speech recognition , 2021, EURASIP J. Audio Speech Music. Process..

[12]  Ch. V. Rama Rao,et al.  Performance analysis of ASR system in hybrid DNN-HMM framework using a PWL euclidean activation function , 2021, Frontiers of Computer Science.

[13]  Hung-Wen Chiu,et al.  Make Patient Consultation Warmer: A Clinical Application for Speech Emotion Recognition , 2021, Applied Sciences.

[14]  R. Cox,et al.  Interactions between Cognition and Hearing Aid Compression Release Time: Effects of Linguistic Context of Speech Test Materials on Speech-in-Noise Performance , 2021, Audiology research.

[15]  Xu Yang,et al.  A Broad Learning System to Enhance Performance of Speech Emotion Recognition , 2021 .

[16]  Marc A Brennan,et al.  Influence of aided audibility on speech recognition performance with frequency composition for children and adults , 2021, International journal of audiology.

[17]  P. Martínez-Beneyto,et al.  Clinical Performance Assessment of a New Active Osseointegrated Implant System in Mixed Hearing Loss: Results From a Prospective Clinical Investigation , 2021, Otology & neurotology : official publication of the American Otological Society, American Neurotology Society [and] European Academy of Otology and Neurotology.

[18]  Suman Maloji,et al.  Performance Analysis of Feature sets in Speaker Diarization techniques , 2021 .

[19]  Bakht Zada,et al.  Pashto isolated digits recognition using deep convolutional neural network , 2020, Heliyon.

[20]  Shenglan Liu,et al.  Deep attention based music genre classification , 2020, Neurocomputing.

[21]  M. Kostylev,et al.  Reservoir Computing Using a Spin-Wave Delay-Line Active-Ring Resonator Based on Yttrium-Iron-Garnet Film , 2019, Physical Review Applied.

[22]  Yu Zhang,et al.  Very deep convolutional networks for end-to-end speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Anthony S. Maida,et al.  Training a Hidden Markov Model with a Bayesian Spiking Neural Network , 2016, Journal of Signal Processing Systems.

[24]  Feng Wang,et al.  Survey on the attention based RNN model and its applications in computer vision , 2016, ArXiv.

[25]  Yu Zhang,et al.  Highway long short-term memory RNNS for distant speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  R. Deepa,et al.  A Ophthalmology Study on Eye Glaucoma and Retina Applied in AI and Deep Learning Techniques , 2021, Journal of Physics: Conference Series.