A novel decoding framework for extractive speech summarization with Rhetorical Structure modeling

We propose a novel decoding framework (Hybrid RSD-SE framework: Rhetorical Structure Decoding-Summary Extraction framework) for extracting summaries with rhetorical structure information from speech. Rhetorical structure hidden in speech data which helps us understand speech easier is always under-utilized. The hybrid RSDSE framework automatically decodes this underlying information in order to provide better-organized summaries by combining the process of rhetorical structure extraction with the process of summarization. We show that the hybrid RSD-SE framework gives a 82.24% ROUGE-L F-measure, a 3.33% absolute increase in lecture speech summarization performance compared to the baseline systems [1], [2].

[1]  Berlin Chen,et al.  Chinese Spoken Document Summarization Using Probabilistic Latent Topical Information , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Fei Liu,et al.  Using n-best recognition output for extractive summarization and keyword extraction in meeting speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Pascale Fung,et al.  RSHMM++ for extractive lecture speech summarization , 2008, 2008 IEEE Spoken Language Technology Workshop.

[4]  Pascale Fung,et al.  Automatic minute generation for parliamentary speech using conditional random fields , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Julia Hirschberg,et al.  Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization , 2005, INTERSPEECH.

[6]  Wen-Lian Hsu,et al.  Combining Relevance Language Modeling and Clarity Measure for Extractive Speech Summarization , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Yang Liu,et al.  13. Speech Summarization , 2011 .

[9]  Jian Zhang,et al.  A comparative study on collectives of term weighting methods for extractive presentation speech summarization , 2015, 2015 International Conference on Asian Language Processing (IALP).

[10]  Hsin-Hsi Chen,et al.  Extractive Broadcast News Summarization Leveraging Recurrent Neural Network Language Modeling Techniques , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Hsin-Hsi Chen,et al.  Leveraging Effective Query Modeling Techniques for Speech Recognition and Summarization , 2014, EMNLP.

[12]  Hsin-Hsi Chen,et al.  A recurrent neural network language modeling framework for extractive speech summarization , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[13]  Christopher D. Manning,et al.  Effect of Non-linear Deep Architecture in Sequence Labeling , 2013, IJCNLP.

[14]  Sadaoki Furui,et al.  Automatic speech summarization applied to English broadcast news speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Pascale Fung,et al.  Learning deep rhetorical structure for extractive speech summarization , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[18]  Frédéric Béchet,et al.  Call Centre Conversation Summarization: A Pilot Task at Multiling 2015 , 2015, SIGDIAL Conference.

[19]  Hsin-Min Wang,et al.  A unified probabilistic generative framework for extractive spoken document summarization , 2007, INTERSPEECH.

[20]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[21]  Li-Rong Dai,et al.  Sequence training of multiple deep neural networks for better performance and faster training speed , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).