Speech Quality Assessment Over Lossy Transmission Channels Using Deep Belief Networks

Nowadays, there are several telephone services based on IP networks. However, the networks can present many disturbances, such as packet loss rate (PLR), which is one of the most impairing network factors. An impaired speech communication affects the users’ quality of experience; hence, the assessment of speech quality is relevant to the telephone operators. Therefore, the determination of a methodology to predict a speech quality with a higher accuracy in telephone services is relevant. In this context, this letter introduces a novel nonintrusive speech quality classifier (SQC) model based on deep belief networks (DBN), in which the support vector machine with radial basis function kernel is the classifier applied in DBN, in order to identify four speech quality classes. A speech database was built, based on unimpaired speech files of public databases, in which different PLR models and values are applied, and a standardized intrusive method is used to calculate the index quality of each file. Results show that SQC largely overcomes the results obtained by ITU-T Recommendation P.563. Also, subjective tests are performed to validate the SQC performance, and it reached an accuracy of 95% on speech quality classification. Furthermore, a solution architecture is introduced, demonstrating the usefulness and flexibility of the proposed SQC.

[1]  S. Baskar,et al.  Speech emotion recognition using Deep Dropout Autoencoders , 2015, 2015 IEEE International Conference on Engineering and Technology (ICETECH).

[2]  Sebastian Möller,et al.  Non-Intrusive Estimation Model for the Speech-Quality Dimension Loudness , 2016, ITG Symposium on Speech Communication.

[3]  C. L. Philip Chen,et al.  Fuzzy Restricted Boltzmann Machine for the Enhancement of Deep Learning , 2015, IEEE Transactions on Fuzzy Systems.

[4]  Richard A. Thompson,et al.  Adaptive Speech Quality Management in Voice-over-IP Communications , 2009, 2009 Fifth Advanced International Conference on Telecommunications.

[5]  Geoffrey E. Hinton,et al.  Learning a better representation of speech soundwaves using restricted boltzmann machines , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .

[7]  Stefan Winkler,et al.  Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives , 2016, Multimedia Systems.

[8]  Sebastian Möller,et al.  Speech Quality Estimation: Models and Trends , 2011, IEEE Signal Processing Magazine.

[9]  Yanmin Qian,et al.  Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[11]  Zhou Wang,et al.  The impact of video-quality-level switching on user quality of experience in dynamic adaptive streaming over HTTP , 2014, EURASIP J. Wirel. Commun. Netw..

[12]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[13]  Danyang Li,et al.  Random Deep Belief Networks for Recognizing Emotions from Speech Signals , 2017, Comput. Intell. Neurosci..

[14]  Chuan Liu,et al.  Classification of Music and Speech in Mandarin News Broadcasts , 2007 .

[15]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Junfei Qiao,et al.  An improved RBM based on Bayesian Regularization , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[17]  Richard M. Stern,et al.  Locally Normalized Filter Banks Applied to Deep Neural-Network-Based Robust Speech Recognition , 2017, IEEE Signal Processing Letters.

[18]  M. Dehghan,et al.  Improved ITU-P.563 Non-Intrusive Speech Quality Assessment Method For Covering VOIP Conditions , 2008, 2008 10th International Conference on Advanced Communication Technology.

[19]  Meir Tzur,et al.  Speech reconstruction from mel frequency cepstral coefficients and pitch frequency , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[20]  Philip C. Woodland,et al.  Very deep convolutional neural networks for robust speech recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[21]  T Affonso Emmanuel,et al.  Voice quality assessment in mobile devices considering different fading models , 2016 .

[22]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[23]  Tim Polzehl,et al.  Non-intrusive Estimation of Noisiness as a Perceptual Quality Dimension of Transmitted Speech , 2016 .

[24]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[25]  Shenghui Zhao,et al.  Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines , 2016, IEEE Signal Processing Letters.

[26]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[27]  Joon-Hyuk Chang,et al.  Packet Loss Concealment Based on Deep Neural Networks for Digital Speech Transmission , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[29]  C. Montag,et al.  Smartphone usage in the 21st century: who is active on WhatsApp? , 2015, BMC Research Notes.

[30]  Holger Jaekel,et al.  Cooperative interference cancellation using device-to-device communications , 2014, IEEE Communications Magazine.

[31]  Hui Lin,et al.  Switching Auxiliary Chains for Speech Recognition , 2007, IEEE Signal Processing Letters.

[32]  Hua Zhao,et al.  Deep Belief Networks and deep learning , 2015, Proceedings of 2015 International Conference on Intelligent Computing and Internet of Things.

[33]  J. Beerends,et al.  Perceptual Objective Listening Quality Assessment ( POLQA ) , The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part II – Perceptual Model , 2013 .

[34]  Francisco J. Suárez,et al.  Assessing the QoE in Video Services Over Lossy Networks , 2015, Journal of Network and Systems Management.

[35]  Christian Poellabauer,et al.  Noise management in mobile speech based health tools , 2014, 2014 IEEE Healthcare Innovation Conference (HIC).

[36]  Zhen-Hua Ling,et al.  DBN-based Spectral Feature Representation for Statistical Parametric Speech Synthesis , 2016, IEEE Signal Processing Letters.

[37]  Peter Schelkens,et al.  Qualinet White Paper on Definitions of Quality of Experience , 2013 .

[38]  Alexander Raake,et al.  Quality of Experience , 2014, T-Labs Series in Telecommunication Services.

[39]  Peter Pocta,et al.  An analysis of the impact of packet loss, codecs and type of voice on internal parameters of P.563 model , 2014, The 10th International Conference on Digital Technologies 2014.

[40]  Renata Lopes Rosa,et al.  Real-time evaluation of speech quality in mobile communication services , 2016, 2016 IEEE International Conference on Consumer Electronics (ICCE).

[41]  S MONIKA,et al.  An Efficient Digital Speech Transmission using Neural Network with HMM ( Hidden Markov Model ) , 2016 .