MACH 1 FOR NONUNIFORM TIME-SCALE MODIFICATION OF SPEECH : THEORY , TECHNIQUE , AND COMPARISONS

We propose a new approach to nonuniform time compression, called Mach1, designed to mimic the natural timing of fast speech. At identical overall compression rates, listener comprehension for Mach1-compressed speech increased between 5 and 31 percentage points 2 over that for linearly compressed speech, and response times dropped by 15%. For rates between 2.5 and 4.2 times real time, there was no significant comprehension loss with increasing Mach1 compression rates. In A–B preference tests, Mach1-compressed speech was chosen 95% of the time. This paper describes the Mach1 technique and our listener-test results. Audio examples can be found on http://www.interval.com/papers/ 1997-061/. The research described in this paper is the basis for our submission to the 1998 International Conference on Acoustics, Speech, and Signal Processing. The description provided here is a longer and more complete description of our approach and our results than we could fit into the ICASSP paper format. However, since our ICASSP submission is effectively a subset of that description, we have included the IEEE copyright notice below. Interval Research Corporation Technical Report # 1997-061 Copyright 1998 IEEE. Published in the Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, May 12-15, 1998. Seattle, Washington. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966.

[1]  Jan P. H. van Santen,et al.  Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[2]  C. Mills,et al.  Listening Rate and Comprehension as a Function of Preference for and Exposure to Time-Altered Speech , 1989, Perceptual and motor skills.

[3]  Francine R. Chen,et al.  The use of emphasis to automatically summarize a spoken discourse , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Jan P. H. van Santen,et al.  Contextual effects on vowel duration , 1992, Speech Commun..

[5]  Catherine Fulford Can learning be more efficient?: Using compressed speech audio tapes to enhance systematically designed text , 1993 .

[6]  D. C. Howell Statistical Methods for Psychology , 1987 .

[7]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[9]  Ralph R. Behnke,et al.  The Effect of Time‐Compressed Speech on Comprehensive, Interpretive, and Short‐Term Listening , 1989 .

[10]  Barry Arons,et al.  Interactively skimming recorded speech , 1994 .

[11]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[12]  K. Stevens Acoustic correlates of some phonetic categories. , 1979, The Journal of the Acoustical Society of America.

[13]  Hyung Soon Kim,et al.  Variable time-scale modification of speech using transient information , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.