Temporal Compression Of Speech: An Evaluation

Efficient browsing of speech recordings is problematic. The linear nature of speech, coupled with the lack of abstraction that the medium affords, means that listeners have to listen to long segments of a recording to locate points of interest. We explore temporal compression algorithms that attempt to reduce the amount of time users require to listen to speech recordings, while retaining the important content. This paper implements two main approaches to temporal compression: artificial speech rate alteration (speed-up) and unimportant segment removal (excision). We evaluate the effectiveness of these approaches by having listeners rate comprehension and listening effort for different types of temporal compression. For different compression levels, we compare performance of various implementations of speed-up and excision as well as techniques based on semantic features and acoustic features. Our results indicate that listeners prefer low compression levels, excision over speed-up, and algorithms based on semantic rather than acoustic features. Finally, listeners were negative about hybrid algorithms that used speed-up to indicate missing regions in an excised recording.

[1]  Julia Hirschberg,et al.  From text to speech summarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.

[3]  Malcolm Slaney,et al.  MACH1: nonuniform time-scale modification of speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Steve Whittaker,et al.  Novel techniques for time-compressing speech: an exploratory study , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  G W Heiman,et al.  Word intelligibility decrements and the comprehension of time-compressed speech , 1986, Perception & psychophysics.

[6]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[7]  Walter Bender,et al.  Improving speech playback using time-compression and speech recognition , 2004, CHI.

[8]  Steve Whittaker,et al.  Semantic speech editing , 2004, CHI.

[9]  Donald Joseph Hejna,et al.  Real-time time-scale modification of speech via the synchronized overlap-add algorithm , 1990 .

[10]  M. Portnoff,et al.  Time-scale modification of speech based on short-time Fourier analysis , 1981 .

[11]  Aaron E. Rosenberg,et al.  SCANMail: a voicemail interface that makes speech browsable, readable and searchable , 2002, CHI.

[12]  Philip R. Cohen,et al.  Referring as a Collaborative Process , 2003 .

[13]  Anoop Gupta,et al.  User Benefits of Non-Linear Time Compression , 2000 .

[14]  Björn W. Schuller,et al.  Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles , 2005, INTERSPEECH.

[15]  Steve Whittaker,et al.  Time is of the essence: an evaluation of temporal compression algorithms , 2006, CHI.

[16]  Andreas Girgensohn,et al.  An intelligent media browser using automatic multimodal analysis , 1998, MULTIMEDIA '98.

[17]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[18]  Julia Hirschberg,et al.  Play it again: a study of the factors underlying speech browsing behavior , 1998, CHI Conference Summary.

[19]  Samy Bengio,et al.  Modeling human interaction in meetings , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[20]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[21]  Thomas G. Sticht Comprehension of Repeated Time-Compressed Recordings , 1969 .

[22]  Julia Hirschberg,et al.  Automatic summarization of broadcast news using structural features , 2003, INTERSPEECH.

[23]  Barry Arons,et al.  Techniques, Perception, and Applications of Time-Compressed Speech , 2009 .

[24]  Darren Moore,et al.  The IDIAP Smart Meeting Room , 2002 .

[25]  Sadaoki Furui,et al.  A new approach to automatic speech summarization , 2003, IEEE Trans. Multim..

[26]  Daniel S. Beasley,et al.  chapter 12 – Time- and Frequency-Altered Speech , 1976 .

[27]  Lawrence R. Rabiner,et al.  Application of an LPC distance measure to the voiced-unvoiced-silence detection problem , 1977 .

[28]  Steve Whittaker,et al.  Accessing Multimodal Meeting Data: Systems, Problems and Possibilities , 2004, MLMI.