User Benefits of Non-Linear Time Compression

In comparison to text, audio-video content is much more challenging to browse. Time-compression has been suggested as a key technology that can support browsing – time compression speeds-up the playback of audio-video content without causing the pitch to change. Simple forms of time-compression are starting to appear in commercial streaming-media products from Microsoft and Real Networks. In this paper we explore the potential benefits of more recent and advanced types of time compression, called non-linear time compression. The most advanced of these algorithms exploit fine-grain structure of human speech (e.g., phonemes) to differentially speed-up segments of speech, so that the overall speed-up can be higher. In this paper we explore what are the actual gains achieved by end-users from these advanced algorithms, and whether the gains are worth the additional systems complexity. Our results indicate that the gains today are actually quite small and may not be worth the additional complexity.

[1]  Malcolm Slaney,et al.  MACH1: nonuniform time-scale modification of speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Kien A. Hua,et al.  A framework for supporting previewing and VCR operations in a low bandwidth environment , 1997, MULTIMEDIA '97.

[3]  Kevin Harrigan,et al.  The SPECIAL System: Self-Paced Education with Compressed Interactive Audio Learning , 1995 .

[4]  Anoop Gupta,et al.  Time-compression: systems concerns, usage, and benefits , 1999, CHI '99.

[5]  Jan P. H. van Santen,et al.  Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[6]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[7]  Hyung Soon Kim,et al.  Variable time-scale modification of speech using transient information , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Kenneth C. Davis Don't know much about geography , 1993 .

[9]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Francine R. Chen,et al.  Computational Models of American Speech , 1992 .

[11]  N. F. Maxemchuk,et al.  An experimental speech storage and editing facility , 1980, The Bell System Technical Journal.

[12]  Barry Arons,et al.  Techniques, Perception, and Applications of Time-Compressed Speech , 2009 .

[13]  G. Fairbanks,et al.  Method for time of frequency compression-expansion of speech , 1954 .

[14]  Anoop Gupta,et al.  Designing presentations for on-demand viewing , 2000, CSCW '00.

[15]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[16]  Robert W. Donaldson,et al.  Adaptive silence deletion for speech storage and voice mail applications , 1988, IEEE Trans. Acoust. Speech Signal Process..

[17]  G W Heiman,et al.  Word intelligibility decrements and the comprehension of time-compressed speech , 1986, Perception & psychophysics.

[18]  Anoop Gupta,et al.  Browsing digital video , 2000, CHI.

[19]  T. Sticht,et al.  Review of research on the intelligibility and comprehension of accelerated speech. , 1969, Psychological bulletin.

[20]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .