Same but different? — Using speech signal features for comparing conversational VoIP quality studies

In this paper we demonstrate how speech signal features can be used to detect and explain differences in human to human conversation tests. To this end, we compare the results of two conversational VoIP quality experiments designed to quantify the impact of network delay on perceived speech quality. Both studies followed the same procedures and used the same scenarios, but were conducted in two different labs. Our comparison shows that the two studies, despite having been executed correctly using the same test design, still can produce surprisingly different results regarding the users quality perception on a MOS scale. In this respect, speech signal features extracted from conversation recordings help identifying divergent participant behavior as plausible cause for such differences. Our in-depth analysis reveals how novel parameters developed by us like Intended and Unintended Interruption Rate (IIR, UIR) and the corrected Speaker Alternation Rate SARcorr can be used to successfully determine the extent to which the results of different conversational speech quality studies are directly comparable and thus eligible for pooling, or not.

[1]  Tobias Hoßfeld,et al.  SOS: The MOS is not enough! , 2011, 2011 Third International Workshop on Quality of Multimedia Experience.

[2]  Ibon Saratxaga,et al.  Modified LTSE-VAD Algorithm for Applications Requiring Reduced Silence Frame Misclassification , 2010, LREC.

[3]  Alexander Raake,et al.  Conversation Analysis of Multi-Party Conferencing and Its Relation to Perceived Quality , 2011, 2011 IEEE International Conference on Communications (ICC).

[4]  Paul T. Brady,et al.  A technique for investigating on-off patterns of speech , 1965 .

[5]  Nobuhiko Kitawaki,et al.  Pure Delay Effects on Speech Quality in Telecommunications , 1991, IEEE J. Sel. Areas Commun..

[6]  Sebastian Möller,et al.  Assessment and Prediction of Speech Quality in Telecommunications , 2000 .

[7]  Paul T. Brady,et al.  A statistical analysis of on-off patterns in 16 conversations , 1968 .

[8]  METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .

[9]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[10]  Raimund Schatz,et al.  It takes two to tango - assessing the impact of delay on conversational interactivity on perceived speech quality , 2010, INTERSPEECH.

[11]  Régine Le Bouquin-Jeannès,et al.  On the Evaluation of the Conversational Speech Quality in Telecommunications , 2008, EURASIP J. Adv. Signal Process..

[12]  Alexander Raake Short- and Long-Term Packet Loss Behavior: Towards Speech Quality Prediction for Arbitrary Loss Distributions , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Gernot Kubin,et al.  Subjective evaluation of conversational multimedia quality in IP networks , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[14]  Alexander Raake Predicting speech quality under random packet loss: Individual impairment and additivity with other network impairments , 2004 .