An Adaptive Bitrate Switching Algorithm for Speech Applications in Context of WebRTC

Web Real-Time Communication (WebRTC) combines a set of standards and technologies to enable high-quality audio, video, and auxiliary data exchange in web browsers and mobile applications. It enables peer-to-peer multimedia sessions over IP networks without the need for additional plugins. The Opus codec, which is deployed as the default audio codec for speech and music streaming in WebRTC, supports a wide range of bitrates. This range of bitrates covers narrowband, wideband, and super-wideband up to fullband bandwidths. Users of IP-based telephony always demand high-quality audio. In addition to users’ expectation, their emotional state, content type, and many other psychological factors; network quality of service; and distortions introduced at the end terminals could determine their quality of experience. To measure the quality experienced by the end user for voice transmission service, the E-model standardized in the ITU-T Rec. G.107 (a narrowband version), ITU-T Rec. G.107.1 (a wideband version), and the most recent ITU-T Rec. G.107.2 extension for the super-wideband E-model can be used. In this work, we present a quality of experience model built on the E-model to measure the impact of coding and packet loss to assess the quality perceived by the end user in WebRTC speech applications. Based on the computed Mean Opinion Score, a real-time adaptive codec parameter switching mechanism is used to switch to the most optimum codec bitrate under the present network conditions. We present the evaluation results to show the effectiveness of the proposed approach when compared with the default codec configuration in WebRTC.

[1]  Mary Shaw,et al.  Beyond objects: a software design paradigm based on process control , 1995, SOEN.

[2]  Gabriel-Miro Muntean,et al.  Hybrid real-time quality assessment model for voice over IP , 2015, 2015 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting.

[3]  Ioannis Lambadaris,et al.  New speech traffic background simulation models for realistic VoIP network planning , 2010, Proceedings of the 2010 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS '10).

[4]  Oliver Jokisch,et al.  QuARTCS: A Tool Enabling End-to-Any Speech Quality Assessment of WebRTC-Based Calls , 2018, SPECOM.

[5]  Doh-Suk Kim,et al.  ANIQUE+: A new American national standard for non-intrusive estimation of narrowband speech quality , 2007, Bell Labs Technical Journal.

[6]  Is-Haka Mkwawa,et al.  Feedback-Free Early VoIP Quality Adaptation Scheme in Next Generation Networks , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[7]  Oliver Jokisch,et al.  Review of the Opus Codec in a WebRTC Scenario for Audio and Speech Communication , 2015, SPECOM.

[8]  Klaus Wehrle,et al.  An Adaptive Codec Switching Scheme for SIP-Based VoIP , 2012, NEW2AN.

[9]  Brendan Jennings,et al.  A generic algorithm for mid-call audio codec switching , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[10]  J. Beerends,et al.  Perceptual Objective Listening Quality Assessment ( POLQA ) , The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part II – Perceptual Model , 2013 .

[11]  Lan Chen,et al.  Source and Channel Coding Adaptation for Optimizing VoIP Quality of Experience in Cellular Systems , 2010, 2010 IEEE Wireless Communication and Networking Conference.

[12]  Ian Marsh,et al.  A systematic study of PESQ’s behavior(from a networking perspective) , 2006 .

[13]  Edjair de Souza Mota,et al.  Survey on application-layer mechanisms for speech quality adaptation in VoIP , 2013, CSUR.

[14]  Timothy B. Terriberry,et al.  Definition of the Opus Audio Codec , 2012, RFC.

[15]  Koen Vos,et al.  RTP Payload Format for the Opus Speech and Audio Codec , 2015, RFC.

[16]  Boni García,et al.  Understanding and estimating quality of experience in WebRTC applications , 2018, Computing.

[17]  Eirik Fosser,et al.  Quality of Experience of WebRTC based video communication , 2016 .

[18]  A. Raake,et al.  Parameter-based prediction of speech quality in listening context—Towards a WB E-model , 2010, 2010 Second International Workshop on Quality of Multimedia Experience (QoMEX).

[19]  Sebastian Möller,et al.  Quantifying Quality Degradation of the EVS Super-Wideband Speech Codec , 2018, 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX).

[20]  Claudio Casetti,et al.  A framework for the analysis of adaptive voice over IP , 2000, 2000 IEEE International Conference on Communications. ICC 2000. Global Convergence Through Communications. Conference Record.

[21]  Anssi Rämö,et al.  Voice Quality Characterization of IETF Opus Codec , 2011, INTERSPEECH.

[22]  Marcel Wältermann,et al.  Extension of the E-model towards super-wideband speech transmission , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Mark Handley,et al.  SDP: Session Description Protocol , 1998, RFC.

[24]  METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .

[25]  See Leng Ng,et al.  Effectiveness of adaptive codec switching VoIP application over heterogeneous networks , 2005, 2005 2nd Asia Pacific Conference on Mobile Technology, Applications and Systems.

[26]  Peter Pocta,et al.  Instrumental Estimation of E-model Equipment Impairment Factor Parameters for Super-wideband Opus Codec , 2019, 2019 30th Irish Signals and Systems Conference (ISSC).

[27]  Guido H. Petit,et al.  Assessing Voice Quality in Packet-Based Telephony , 2002, IEEE Internet Comput..

[28]  S. Hemminger Network Emulation with NetEm , 2022 .

[29]  Sebastian Möller,et al.  Impairment Factor Framework for Wide-Band Speech Codecs , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Sebastian Möller,et al.  Instrumental Estimation of E-Model Parameters for Wideband Speech Codecs , 2010, EURASIP J. Audio Speech Music. Process..

[31]  Gerhard Haßlinger,et al.  The Gilbert-Elliott Model for Packet Loss in Real Time Services on the Internet , 2011, MMB.

[32]  Lei Miao,et al.  Standardization of the new 3GPP EVS codec , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Timothy B. Terriberry,et al.  Constrained-Energy Lapped Transform (CELT) Codec , 2010 .

[34]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[35]  ITU-T Rec. P.862.3 (11/2007) Application guide for objective quality measurement based on Recommendations P.862, P.862.1 and P.862.2 , 2008 .