论文信息 - HASA-Net: A Non-Intrusive Hearing-Aid Speech Assessment Network

HASA-Net: A Non-Intrusive Hearing-Aid Speech Assessment Network

Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations. Recently, deep neural network (DNN) models have been applied to build non-intrusive speech assessment approaches and confirmed to provide promising performance. However, most DNN-based approaches are designed for normal-hearing listeners without considering hearing-loss factors. In this study, we propose a DNN-based hearing aid speech assessment network (HASA-Net), formed by a bidirectional long short-term memory (BLSTM) model, to predict speech quality and intelligibility scores simultaneously according to input speech signals and specified hearing-loss patterns. To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids. Experimental results show that the predicted speech quality and intelligibility scores of HASA-Net are highly correlated to two well-known intrusive hearing-aid evaluation metrics, hearing aid speech quality index (HASQI) and hearing aid speech perception index (HASPI), respectively.

[1] Tao Qin,et al. MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Chi Yoon Jeong,et al. Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets , 2021, Sensors.

[3] Szu-Wei Fu,et al. Improving the Intelligibility of Speech for Simulated Electric and Acoustic Stimulation Using Fully Convolutional Neural Networks , 2020, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[4] Yu Tsao,et al. STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model , 2020, ArXiv.

[5] Donald S. Williamson,et al. An Attention Enhanced Multi-Task Model for Objective Speech Assessment in Real-World Environments , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Donald S. Williamson,et al. A Classification-Aided Framework for Non-Intrusive Speech Quality Assessment , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[7] B. Olusanya,et al. Hearing loss grades and the International classification of functioning, disability and health , 2019, Bulletin of the World Health Organization.

[8] DeLiang Wang,et al. The optimal threshold for removing noise from speech is similar across normal and impaired hearing-a time-frequency masking study. , 2019, The Journal of the Acoustical Society of America.

[9] Mohamed Rizk,et al. Enhanced smart hearing aid using deep neural networks , 2019 .

[10] Yu Tsao,et al. MOSNet: Deep Learning based Objective Assessment for Voice Conversion , 2019, INTERSPEECH.

[11] Jonathan Le Roux,et al. SDR – Half-baked or Well Done? , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] James M Kates,et al. Using Objective Metrics to Measure Hearing Aid Performance , 2018, Ear and hearing.

[13] Yu Tsao,et al. Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM , 2018, INTERSPEECH.

[14] Jan Mark de Haan,et al. Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15] Sang W Shin,et al. Hearing Loss in Adults. , 2018, The New England journal of medicine.

[16] Yu Tsao,et al. A Deep Denoising Autoencoder Approach to Improving the Intelligibility of Vocoded Speech in Cochlear Implant Simulation , 2017, IEEE Transactions on Biomedical Engineering.

[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[18] Mads Græsbøll Christensen,et al. Pitch-based non-intrusive objective intelligibility prediction , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Md. Rafidul Islam,et al. Non-intrusive objective evaluation of speech quality in noisy condition , 2016, 2016 9th International Conference on Electrical and Computer Engineering (ICECE).

[21] Yusuke Shinohara,et al. Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition , 2016, INTERSPEECH.

[22] W. Alshuaib,et al. Classification of Hearing Loss , 2015 .

[23] John R. Hershey,et al. Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks , 2015, INTERSPEECH.

[24] Simon King,et al. Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25] James M. Kates,et al. The Hearing-Aid Speech Perception Index (HASPI) , 2014, Speech Commun..

[26] James M. Kates,et al. The Hearing-Aid Speech Quality Index (HASQI) Version 2 , 2014 .

[27] DeLiang Wang,et al. Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[28] Jesper Jensen,et al. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[29] Tiago H. Falk,et al. A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[30] Doh-Suk Kim,et al. ANIQUE+: A new American national standard for non-intrusive estimation of narrowband speech quality , 2007, Bell Labs Technical Journal.

[31] J. Berger,et al. P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[32] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[33] Rich Caruana,et al. Multitask Learning , 1997, Machine Learning.

[34] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[35] H. Dillon,et al. The National Acoustic Laboratories' (NAL) New Procedure for Selecting the Gain and Frequency Response of a Hearing Aid , 1986, Ear and hearing.

[36] R. Patterson,et al. The deterioration of hearing with age: frequency selectivity, the critical ratio, the audiogram, and speech threshold. , 1982, The Journal of the Acoustical Society of America.