Time is of the essence: an evaluation of temporal compression algorithms

Although speech is a potentially rich information source, a major barrier to exploiting speech archives is the lack of useful tools for efficiently accessing lengthy speech recordings. This paper develops and evaluates techniques for temporal compression - reducing the time people take to listen to a recording while still extracting critical information. We first describe an exploratory study that identifies novel excision techniques that remove unimportant words or utterances from the recording. We then develop a new method for evaluating how well temporal compression supports users in forming a general understanding of a recording. Applying this method, we demonstrate that excision techniques are generally more effective than standard compression techniques that simply speed up the entire recording.

[1]  Chantal Wouters,et al.  An exploratory study , 2003 .

[2]  Steve Whittaker,et al.  Semantic speech editing , 2004, CHI.

[3]  Steve Whittaker,et al.  Accessing Multimodal Meeting Data: Systems, Problems and Possibilities , 2004, MLMI.

[4]  Aaron E. Rosenberg,et al.  SCANMail: a voicemail interface that makes speech browsable, readable and searchable , 2002, CHI.

[5]  Sadaoki Furui,et al.  A new approach to automatic speech summarization , 2003, IEEE Trans. Multim..

[6]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[7]  Daniel S. Beasley,et al.  chapter 12 – Time- and Frequency-Altered Speech , 1976 .

[8]  Robert E. Kraut,et al.  Expressive richness: a comparison of speech and text as media for revision , 1991, CHI.

[9]  Anoop Gupta,et al.  Distributed meetings: a meeting capture and broadcasting system , 2002, MULTIMEDIA '02.

[10]  Marcel Worring,et al.  NIST Special Publication , 2005 .

[11]  Marisa E. Campbell CHI 2002 , 2002, INTR.

[12]  interactions Staff,et al.  CHI 2005 , 2005 .

[13]  Malcolm Slaney,et al.  MACH 1 FOR NONUNIFORM TIME-SCALE MODIFICATION OF SPEECH : THEORY , TECHNIQUE , AND COMPARISONS , 1998 .

[14]  Anoop Gupta,et al.  User Benefits of Non-Linear Time Compression , 2000 .

[15]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[16]  E. Hartwell Success Story , 1969, Nature.

[17]  Barry Arons,et al.  SpeechSkimmer: a system for interactively skimming recorded speech , 1997, TCHI.

[18]  Donald Joseph Hejna,et al.  Real-time time-scale modification of speech via the synchronized overlap-add algorithm , 1990 .

[19]  Malcolm Slaney,et al.  MACH1: nonuniform time-scale modification of speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[20]  Andreas Stolcke,et al.  The Meeting Project at ICSI , 2001, HLT.

[21]  Marisa E. Campbell CHI 2004 , 2004, INTR.

[22]  Steve Whittaker,et al.  Novel techniques for time-compressing speech: an exploratory study , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[23]  Lisa Stifelman,et al.  Augmenting real-world objects: a paper-based audio notebook , 1996, CHI Conference Companion.

[24]  Walter Bender,et al.  Improving speech playback using time-compression and speech recognition , 2004, CHI.

[25]  Thomas G. Sticht Comprehension of Repeated Time-Compressed Recordings , 1969 .

[26]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[27]  Marilyn A. Walker,et al.  A trainable generator for recommendations in multimodal dialog , 2003, INTERSPEECH.

[28]  Steve Whittaker,et al.  A meeting browser evaluation test , 2005, CHI Extended Abstracts.

[29]  Julia Hirschberg,et al.  From text to speech summarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..