论文信息 - NIML: non-intrusive machine learning-based speech quality prediction on VoIP networks

NIML: non-intrusive machine learning-based speech quality prediction on VoIP networks

Voice over Internet Protocol (VoIP) networks have recently emerged as a promising telecommunication medium for transmitting voice signal. One of the essential aspects that interests researchers is how to estimate the quality of transmitted voice over VoIP for several purposes such as design and technical issues. Two methodologies are used to evaluate the voice, which are subjective and objective methods. In this study, the authors propose a non-intrusive machine learning-based (NIML) objective method to estimate the quality of voice. In particular, they build a training set of parameters – from the network and the voice itself – along with the quality of voices as labels. The voice quality is estimated using the perceptual evaluation of speech quality (PESQ) method as an intrusive algorithm. Then, the authors use a set of classifiers to build models for estimating the quality of the transmitted voice from the training set. The experimental results show that the classifier models have a valuable performance where Random Forest model has superior results compared to other models of precision 94.1%, recall 94.2%, and receiver operating characteristic area 99.2% as evaluation metrics.

[1] L B Lusted,et al. Signal detectability and medical decision-making. , 1971, Science.

[2] Fotini-Niovi Pavlidou,et al. VoIP: A comprehensive survey on a promising technology , 2009, Comput. Networks.

[3] Mousa Al-Akhras,et al. Non-intrusive speech quality prediction in VoIP networks using a neural network approach , 2009, Neurocomputing.

[4] H. B. Kekre,et al. A two-state Markov model of speech in conversation and its application to computer communication systems , 1977 .

[5] Abdulhussain E. Mahdi,et al. Advances in voice quality measurement in modern telecommunications , 2009, Digit. Signal Process..

[6] Andrew Sekey,et al. An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[7] Buket D. Barkana,et al. Deep neural network framework and transformed MFCCs for speaker's age and gender classification , 2017, Knowl. Based Syst..