Time-Compressing Speech: ASR Transcripts Are an Effective Way to Support Gist Extraction

A major problem for users exploiting speech archives is the laborious nature of speech access. Prior work has developed methods that allow users to efficiently identify and access the gist of an archive using textual transcripts of the conversational recording. Text processing techniques are applied to these transcripts to identify unimportant parts of the recording and to excise these, reducing the time taken to identify the main points of the recording. However our prior work has relied on human-generated as opposed to automatically generated transcripts. Our study compares excision methods applied to human-generated and automatically generated transcripts with state of the art word error rates (38%). We show that both excision techniques provide equivalent support for gist extraction. Furthermore, both techniques perform better than the standard speedup techniques used in current applications. This suggests that excision is a viable technique for gist extraction in many practical situations.

[1]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[2]  Steve Whittaker,et al.  Temporal Compression Of Speech: An Evaluation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Andreas Stolcke,et al.  The Meeting Project at ICSI , 2001, HLT.

[4]  Konstantinos Koumpis,et al.  Automatic summarization of voicemail messages using lexical and prosodic features , 2005, TSLP.

[5]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[6]  Jean Carletta,et al.  The AMI meeting corpus , 2005 .

[7]  Andrei Popescu-Belis,et al.  Machine Learning for Multimodal Interaction , 4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers , 2008, MLMI.

[8]  Malcolm Slaney,et al.  MACH1: nonuniform time-scale modification of speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Anoop Gupta,et al.  User Benefits of Non-Linear Time Compression , 2000 .

[10]  Steve Whittaker,et al.  Time is of the essence: an evaluation of temporal compression algorithms , 2006, CHI.

[11]  Aaron E. Rosenberg,et al.  SCANMail: a voicemail interface that makes speech browsable, readable and searchable , 2002, CHI.

[12]  Lukás Burget,et al.  The 2005 AMI System for the Transcription of Speech in Meetings , 2005, MLMI.

[13]  Johanna D. Moore,et al.  Automatic Segmentation and Summarization of Meeting Speech , 2007, HLT-NAACL.

[14]  Donald Joseph Hejna,et al.  Real-time time-scale modification of speech via the synchronized overlap-add algorithm , 1990 .

[15]  Steve Renals,et al.  Term-Weighting for Summarization of Multi-party Spoken Dialogues , 2007, MLMI.

[16]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[17]  Steve Whittaker,et al.  A meeting browser evaluation test , 2005, CHI Extended Abstracts.

[18]  Lukás Burget,et al.  The AMI System for the Transcription of Speech in Meetings , 2007, ICASSP.

[19]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[20]  Guizhong Liu,et al.  A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.