Data driven method for non-intrusive speech intelligibility estimation

We propose a data driven, non-intrusive method for speech intelligibility estimation. We begin with a large set of speech signal specific features and use a dimensionality reduction approach based on correlation and principal component analysis to find the most relevant features for intelligibility prediction. These are then used to train a Gaussian mixture model from which the intelligibility of unseen data is inferred. Experimental results show that our method gives a correlation with subjective intelligibility of 0.92 and a correlation of 0.96 with the ANSI standard Speech Intelligibility Index.

[1]  Andrew Faulkner,et al.  Perceptual adaptation by normally hearing listeners to a simulated "hole" in hearing. , 2006, The Journal of the Acoustical Society of America.

[2]  E. Lehmann,et al.  Nonparametrics: Statistical Methods Based on Ranks , 1976 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .

[5]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[6]  Patrick A. Naylor,et al.  Evaluation of pitch estimation in noisy speech for application in non-intrusive speech quality assessment , 2009, 2009 17th European Signal Processing Conference.

[7]  K. D. Kryter PROPOSED METHODS FOR THE CALCULATION OF THE ARTICULATION INDEX , 1961 .

[8]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[9]  Josef Kittler,et al.  Floating search methods for feature selection with nonmonotonic criterion functions , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[10]  W. Bastiaan Kleijn,et al.  Low-Complexity, Nonintrusive Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.