HASA-Net: A Non-Intrusive Hearing-Aid Speech Assessment Network

Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations. Recently, deep neural network (DNN) models have been applied to build non-intrusive speech assessment approaches and confirmed to provide promising performance. However, most DNN-based approaches are designed for normal-hearing listeners without considering hearing-loss factors. In this study, we propose a DNN-based hearing aid speech assessment network (HASA-Net), formed by a bidirectional long short-term memory (BLSTM) model, to predict speech quality and intelligibility scores simultaneously according to input speech signals and specified hearing-loss patterns. To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids. Experimental results show that the predicted speech quality and intelligibility scores of HASA-Net are highly correlated to two well-known intrusive hearing-aid evaluation metrics, hearing aid speech quality index (HASQI) and hearing aid speech perception index (HASPI), respectively.

[1]  Tao Qin,et al.  MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Chi Yoon Jeong,et al.  Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets , 2021, Sensors.

[3]  Szu-Wei Fu,et al.  Improving the Intelligibility of Speech for Simulated Electric and Acoustic Stimulation Using Fully Convolutional Neural Networks , 2020, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[4]  Yu Tsao,et al.  STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model , 2020, ArXiv.

[5]  Donald S. Williamson,et al.  An Attention Enhanced Multi-Task Model for Objective Speech Assessment in Real-World Environments , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Donald S. Williamson,et al.  A Classification-Aided Framework for Non-Intrusive Speech Quality Assessment , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[7]  B. Olusanya,et al.  Hearing loss grades and the International classification of functioning, disability and health , 2019, Bulletin of the World Health Organization.

[8]  DeLiang Wang,et al.  The optimal threshold for removing noise from speech is similar across normal and impaired hearing-a time-frequency masking study. , 2019, The Journal of the Acoustical Society of America.

[9]  Mohamed Rizk,et al.  Enhanced smart hearing aid using deep neural networks , 2019 .

[10]  Yu Tsao,et al.  MOSNet: Deep Learning based Objective Assessment for Voice Conversion , 2019, INTERSPEECH.

[11]  Jonathan Le Roux,et al.  SDR – Half-baked or Well Done? , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  James M Kates,et al.  Using Objective Metrics to Measure Hearing Aid Performance , 2018, Ear and hearing.

[13]  Yu Tsao,et al.  Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM , 2018, INTERSPEECH.

[14]  Jan Mark de Haan,et al.  Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Sang W Shin,et al.  Hearing Loss in Adults. , 2018, The New England journal of medicine.

[16]  Yu Tsao,et al.  A Deep Denoising Autoencoder Approach to Improving the Intelligibility of Vocoded Speech in Cochlear Implant Simulation , 2017, IEEE Transactions on Biomedical Engineering.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Mads Græsbøll Christensen,et al.  Pitch-based non-intrusive objective intelligibility prediction , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Shinji Watanabe,et al.  Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Md. Rafidul Islam,et al.  Non-intrusive objective evaluation of speech quality in noisy condition , 2016, 2016 9th International Conference on Electrical and Computer Engineering (ICECE).

[21]  Yusuke Shinohara,et al.  Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition , 2016, INTERSPEECH.

[22]  W. Alshuaib,et al.  Classification of Hearing Loss , 2015 .

[23]  John R. Hershey,et al.  Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks , 2015, INTERSPEECH.

[24]  Simon King,et al.  Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  James M. Kates,et al.  The Hearing-Aid Speech Perception Index (HASPI) , 2014, Speech Commun..

[26]  James M. Kates,et al.  The Hearing-Aid Speech Quality Index (HASQI) Version 2 , 2014 .

[27]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Doh-Suk Kim,et al.  ANIQUE+: A new American national standard for non-intrusive estimation of narrowband speech quality , 2007, Bell Labs Technical Journal.

[31]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[34]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[35]  H. Dillon,et al.  The National Acoustic Laboratories' (NAL) New Procedure for Selecting the Gain and Frequency Response of a Hearing Aid , 1986, Ear and hearing.

[36]  R. Patterson,et al.  The deterioration of hearing with age: frequency selectivity, the critical ratio, the audiogram, and speech threshold. , 1982, The Journal of the Acoustical Society of America.