We present results from a pilot study directed at developing an anchorable subjective speech quality test. The test uses multidimensional scaling techniques to obtain quantitative information about the perceptual attributes of speech. In the first phase of the study, subjects ranked perceptual distances between samples of speech produced by two different talkers, one male and one female, processed by a variety of codecs. The resulting distance matrices were processed to obtain, for each talker, a stimulus space for the various speech samples. This stimulus space has the properties that distances between stimuli in this space correspond to perceptual distances between stimuli and that the dimensions of this space correspond to attributes used by the subjects in determining perceptual distances. Mean opinion scores (MOS) scores obtained in an earlier study were found to be highly correlated with position in the stimulus space, and the three dimensions of the stimulus space were found to have identifiable physical and perceptual correlates. In the second phase of the study, we developed techniques for fitting speech generated by a new codec under investigation into a previously established stimulus space. The user is provided with a collection of speech samples and with the stimulus space for these speech samples as determined by a large-scale listening test. The user then carries out a much smaller listening test to determine the position of the new stimulus in the previously established stimulus space. This system is anchorable, so that different versions of a codec under development can be compared directly, and it provides more detailed information than the single number provided by MOS testing. We suggest that this information could be used to advantage in algorithm development and in development of objective measures of speech quality.
[1]
B J McDermott,et al.
Multidimensional analyses of circuit quality judgments.
,
1969,
The Journal of the Acoustical Society of America.
[2]
Leigh A. Thorpe,et al.
Subjective evaluation of speech compression codecs and other non-linear voice-path devices for telephony applications
,
1999,
Int. J. Speech Technol..
[3]
J. Chang,et al.
Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition
,
1970
.
[4]
Kristin Precoda,et al.
Listener Differences in Audio Compression Evaluations
,
1997
.
[5]
John Makhoul,et al.
Speech‐quality testing of variable frame rate (VFR) linear predictive (LPC) vocoders
,
1976
.