Identification of Soundbite and Its Speaker Name Using Transcripts of Broadcast News Speech

This article presents a pipeline framework for identifying soundbite and its speaker name from Mandarin broadcast news transcripts. Both of the two modules, soundbite segment detection and soundbite speaker name recognition, are based on a supervised classification approach using multiple linguistic features. We systematically evaluated performance for each module as well as the entire system, and investigated the effect of using speech recognition (ASR) output and automatic sentence segmentation. We found that both of the two components impact the pipeline system, with more degradation in the entire system performance due to automatic speaker name recognition errors than soundbite segment detection. In addition, our experimental results show that using ASR output degrades the system performance significantly, and that using automatic sentence segmentation greatly impacts soundbite detection, but has much less effect on speaker name recognition.

[1]  Mark J. F. Gales,et al.  The Cambridge University March 2005 speaker diarisation system , 2005, INTERSPEECH.

[2]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[3]  Feifan Liu,et al.  Soundbite identification using reference and automatic transcripts of broadcast news speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[4]  Feifan Liu,et al.  Look Who is Talking: Soundbite Speaker Name Recognition in Broadcast News Speech , 2007, HLT-NAACL.

[5]  Patrick Nguyen,et al.  Finding Speaker Identities with a Conditional Maximum Entropy Model , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Dilek Z. Hakkani-Tür,et al.  The ICSI+ multilingual sentence segmentation system , 2006, INTERSPEECH.

[7]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[8]  Corinne Fredouille,et al.  Technical Improvements of the E-HMM Based Speaker Diarization System for Meeting Records , 2006, MLMI.

[9]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[12]  Minh Le Nguyen,et al.  FlexCRFs: Flexible Conditional Random Fields , 2005 .

[13]  N. H. Beebe A Complete Bibliography of ACM Transactions on Asian Language Information Processing , 2007 .

[14]  Frédéric Bimbot,et al.  Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs , 2004, INTERSPEECH.

[15]  Laurent Besacier,et al.  Using a priori information for speaker diarization , 2004, Odyssey.

[16]  Heng Ji,et al.  Improving Name Tagging by Reference Resolution and Relation Detection , 2005, ACL.

[17]  Yang Liu,et al.  Initial Study on Automatic Identification of Speaker Role in Broadcast News Speech , 2006, NAACL.

[18]  D A Reynolds,et al.  The MIT Lincoln Laboratory RT-04F Diarization Systems: Applications to Broadcast Audio and Telephone Conversations , 2004 .

[19]  Wen Wang,et al.  Investigation on Mandarin broadcast news speech recognition , 2006, INTERSPEECH.

[20]  Jing Huang,et al.  The IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings , 2007, CLEAR.

[21]  Sue Tranter Who Really Spoke When? Finding Speaker Turns and Identities in Broadcast News Audio , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  Julia Hirschberg,et al.  Soundbite detection in broadcast news domain , 2006, INTERSPEECH.

[23]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[24]  Guillaume Gravier,et al.  The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[25]  Julia Hirschberg,et al.  The Rules Behind Roles: Identifying Speaker Role in Radio Broadcasts , 2000, AAAI/IAAI.

[26]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[28]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[29]  Jean-Luc Gauvain,et al.  Partitioning and transcription of broadcast news data , 1998, ICSLP.

[30]  Xavier Anguera Miró,et al.  Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information , 2007, IEEE Transactions on Computers.

[31]  Paul Deléglise,et al.  Extracting true speaker identities from transcriptions , 2007, INTERSPEECH.

[32]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  L. Lamel,et al.  A comparative study using manual and automatic transcriptions for diarization , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..