On Losses, Pauses, Jumps, and the Wideband E-Model

There is an increasing interest in upgrading the E-Model, a parametric tool for speech quality estimation, to the wideband and super-wideband contexts. The main motivation behind this has been to quantify the quality gain lent by various new codecs and communication situations. There have been numerous such contributions, and all of them have been more or less successful. This paper reports on an extension of the E-Model to the mixed narrowband/wideband (NB/WB) context. More specifically, we take a novel approach toward deriving effective equipment impairment factors ( $I_{e,WB,eff}$ ) by considering additional impairments related to the underlying communications network. These additional impairments are the pause and jump temporal discontinuities along with network-related loss and pure codec-related impairments. While the effect of loss is a thoroughly studied topic and has been integrated into the E-Model, pauses and jumps have been given little attention. Pauses and jumps manifest themselves as temporal dilation and contraction, respectively, in the resulting speech signal that is presented to the listener and are normally caused by jitter and jitter buffer interaction. In this paper, we initially present a four-state Markov model to characterize, and also emulate, loss, pause, and jump impairments. Then, we present alternative models for computing effective equipment impairment models. A large number of test stimuli were generated using different NB and WB codecs. WB-PESQ was used to evaluate the stimuli. Genetic programming was employed to derive equipment impairment factors. The proposed models have a high correlation with WB-PESQ. We claim that the models proposed by us outperform the existing E-Model by a factor of approximately 29% while using WB-PESQ as a reference model. The models also outperform the E-Model against results from auditory tests. It is also shown that the models outperform the results of multiple linear regressions.

[1]  Andrew Hines,et al.  Detailed comparative analysis of PESQ and VISQOL behaviour in the context of playout delay adjustments introduced by VOIP jitter buffer algorithms , 2013, 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX).

[2]  Les M. Howard,et al.  The GA-P: A Genetic Algorithm and Genetic Programming Hybrid , 1995, IEEE Expert.

[3]  Dong Xu,et al.  Characteristics of network delay and delay jitter and its effect on voice over IP (VoIP) , 2001, ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).

[4]  Wenyu Jiang,et al.  Modeling of Packet Loss and Delay and Their Effect on Real-Time Multimedia Service Quality , 2000 .

[5]  Zhuoqun Sun,et al.  Voice quality prediction models and their application in VoIP networks , 2006, IEEE Transactions on Multimedia.

[6]  Anil C. Kokaram,et al.  Robustness of speech quality metrics to background noise and network degradations: Comparing ViSQOL, PESQ and POLQA , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  J. Gordon,et al.  Pareto process as a model of self-similar packet traffic , 1995, Proceedings of GLOBECOM '95.

[8]  Muhammad Adil Raja,et al.  Simulators as Drivers of Cutting Edge Research , 2016, 2016 7th International Conference on Intelligent Systems, Modelling and Simulation (ISMS).

[9]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[10]  Sebastian Möller,et al.  Impairment Factor Framework for Wide-Band Speech Codecs , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  S. Voran Perception of Temporal Discontinuity Impairments in Coded Speech - A Proposal for Objective Estimators and Some Subjective Test Results , 2003 .

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[13]  Andrew Hines,et al.  An Analysis of the Impact of Playout Delay Adjustments Introduced by VoIP Jitter Buffers on Listening Speech Quality , 2015 .

[14]  Chun-Ying Huang,et al.  An empirical evaluation of VoIP playout buffer dimensioning in Skype, Google talk, and MSN Messenger , 2009, NOSSDAV '09.

[15]  Conor Ryan,et al.  A Methodology for Deriving VoIP Equipment Impairment Factors for a Mixed NB/WB Context , 2008, IEEE Transactions on Multimedia.

[16]  Edmund K. Burke,et al.  On improving genetic programming for symbolic regression , 2005, 2005 IEEE Congress on Evolutionary Computation.

[17]  Sara Silva,et al.  GPLAB A Genetic Programming Toolbox for MATLAB , 2004 .

[18]  Lingfen Sun,et al.  A New Buffer Algorithm for Speech Quality Improvement in VoIP Systems , 2008, Wirel. Pers. Commun..

[19]  Adam Wolisz,et al.  A perceptual quality model intended for adaptive VoIP applications , 2006, Int. J. Commun. Syst..

[20]  Conor Ryan,et al.  Real-Time, Non-intrusive Evaluation of VoIP , 2007, EuroGP.

[21]  Sebastian Möller,et al.  Instrumental Estimation of E-Model Parameters for Wideband Speech Codecs , 2010, EURASIP J. Audio Speech Music. Process..

[22]  Alexander Raake,et al.  Speech Quality of VoIP - Assessment and Prediction , 2006 .

[23]  A. Topchy,et al.  Faster genetic programming based on local gradient search of numeric leaf values , 2001 .

[24]  Maarten Keijzer,et al.  Scaled Symbolic Regression , 2004, Genetic Programming and Evolvable Machines.

[25]  William B. Langdon,et al.  Genetic Programming for Mining DNA Chip Data from Cancer Patients , 2004, Genetic Programming and Evolvable Machines.

[26]  Andrew Hunter,et al.  Polynomial-fuzzy decision tree structures for classifying medical data , 2003, Knowl. Based Syst..

[27]  Marcel Wältermann,et al.  Extension of the E-model towards super-wideband speech transmission , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Sean Luke,et al.  Lexicographic Parsimony Pressure , 2002, GECCO.

[29]  Akira Takahashi,et al.  Proposal on objective speech quality assessment for wideband IP telephony , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..