Learning deep rhetorical structure for extractive speech summarization

Extractive summarization of conference and lecture speech is useful for online learning and references. We show for the first time that deep(er) rhetorical parsing of conference speech is possible and helpful to extractive summarization task. This type of rhetorical structures is evident in the corresponding presentation slide structures. We propose using Hidden Markov SVM (HMSVM) to iteratively learn the rhetorical structure of the speeches and summarize them. We show that system based on HMSVM gives a 64.3% ROUGE-L F-measure, a 10.1% absolute increase in lecture speech summarization performance compared with the baseline system without rhetorical information. Our method equally outperforms the baseline with a conventional discourse feature. Our proposed approach is more efficient than and also improves upon a previous method of using shallow rhetorical structure parsing [1].

[1]  Pascale Fung,et al.  Improving lecture speech summarization using rhetorical information , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[2]  Hsin-Min Wang,et al.  Extractive Chinese Spoken Document Summarization Using Probabilistic Ranking Models , 2006, ISCSLP.

[3]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[4]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[5]  Julia Hirschberg,et al.  Discourse Structure in Spoken Language: Studies on Speech Corpora , 1995 .

[6]  Seiichi Nakagawa,et al.  Class lecture summarization taking into account consecutiveness of important sentences , 2008, INTERSPEECH.

[7]  Julia Hirschberg,et al.  Automatic summarization of broadcast news using structural features , 2003, INTERSPEECH.

[8]  Pascale Fung,et al.  Rhetorical-State Hidden Markov Models for extractive speech summarization , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Jian Zhang,et al.  Active Learning of Extractive Reference Summaries for Lecture Speech Summarization , 2009, BUCC@ACL/IJCNLP.

[10]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[11]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[12]  Pascale Fung,et al.  One story, one flow: Hidden Markov Story Models for multilingual multidocument summarization , 2006, TSLP.