Extractive Broadcast News Summarization Leveraging Recurrent Neural Network Language Modeling Techniques

Extractive text or speech summarization manages to select a set of salient sentences from an original document and concatenate them to form a summary, enabling users to better browse through and understand the content of the document. A recent stream of research on extractive summarization is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each sentence in the document to be summarized. In view of this, our work in this paper explores a novel use of recurrent neural network language modeling (RNNLM) framework for extractive broadcast news summarization. On top of such a framework, the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within broadcast news documents, getting around the need for the strict bag-of-words assumption. Furthermore, different model complexities and combinations are extensively analyzed and compared. Experimental results demonstrate the performance merits of our summarization methods when compared to several well-studied state-of-the-art unsupervised methods.

[1]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval: A Critical Review , 2008, Found. Trends Inf. Retr..

[2]  Mari Ostendorf,et al.  Speech Technology and Information Access , 2008 .

[3]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Hsin-Min Wang,et al.  A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Mikael Bodén,et al.  A guide to recurrent neural networks and backpropagation , 2001 .

[8]  Frank Rudzicz,et al.  Summarizing multiple spoken documents: finding evidence from untranscribed audio , 2009, ACL/IJCNLP.

[9]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[10]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[11]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[12]  Yongqiang Wang,et al.  Efficient lattice rescoring using recurrent neural network language models , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Herbert Jaeger,et al.  A tutorial on training recurrent neural networks , covering BPPT , RTRL , EKF and the " echo state network " approach - Semantic Scholar , 2005 .

[14]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[15]  Berlin Chen,et al.  A Risk-Aware Modeling Framework for Speech Summarization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[17]  Hsin-Hsi Chen,et al.  A recurrent neural network language modeling framework for extractive speech summarization , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[18]  Sadaoki Furui,et al.  Speech-to-text and speech-to-speech summarization of spontaneous speech , 2004, IEEE Transactions on Speech and Audio Processing.

[19]  Gerald Penn,et al.  A Critical Reassessment of Evaluation Baselines for Speech Summarization , 2008, ACL.

[20]  Lukás Burget,et al.  Recurrent Neural Network Based Language Modeling in Meeting Recognition , 2011, INTERSPEECH.

[21]  Alexandre Allauzen,et al.  Large Vocabulary SOUL Neural Network Language Models , 2011, INTERSPEECH.

[22]  Graeme Hirst,et al.  Speech Summarization , 2013 .

[23]  James R. Glass,et al.  Unsupervised Word Acquisition from Speech using Pattern Discovery , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[24]  Ricardo Baeza-Yates,et al.  Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .

[25]  Kuan-Yu Chen,et al.  Spoken Document Retrieval With Unsupervised Query Modeling Techniques , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Keiichi Tokuda,et al.  Fundamental Technologies in Modern Speech Recognition [From the Guest Editors] , 2012 .

[27]  Keiichi Tokuda,et al.  Fundamental Technologies in Modern Speech Recognition , 2012 .

[28]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[29]  Yang Liu,et al.  Using Supervised Bigram-based ILP for Extractive Summarization , 2013, ACL.

[30]  Hsin-Min Wang,et al.  A Comparative Study of Probabilistic Ranking Models for Chinese Spoken Document Summarization , 2009, TALIP.

[31]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[32]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[33]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[34]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[35]  Yang Liu,et al.  Using N-Best Lists and Confusion Networks for Meeting Summarization , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[37]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[39]  Lin-shan Lee,et al.  Spoken document understanding and organization , 2005, IEEE Signal Processing Magazine.

[40]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[41]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[42]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[43]  Georg Heigold,et al.  Discriminative Training for Automatic Speech Recognition: Modeling, Criteria, Optimization, Implementation, and Performance , 2012, IEEE Signal Processing Magazine.

[44]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[45]  M. Sanderson Book Reviews: Advances in Automatic Text Summarization , 2000, Computational Linguistics.

[46]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[47]  Mari Ostendorf Speech technology and information access [In the Spotlight] , 2008 .

[48]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[49]  Benoit Favre,et al.  A Scalable Global Model for Summarization , 2009, ILP 2009.

[50]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[51]  Alex Acero,et al.  Soft indexing of speech content for search in spoken documents , 2007, Comput. Speech Lang..

[52]  Berlin Chen,et al.  Extractive speech summarization using evaluation metric-related training criteria , 2013, Inf. Process. Manag..

[53]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[54]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[55]  Berlin Chen,et al.  Lightly supervised and data-driven approaches to Mandarin broadcast news transcription , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  Michel Galley,et al.  A Skip-Chain Conditional Random Field for Ranking Meeting Utterances by Importance , 2006, EMNLP.

[57]  Devdatt P. Dubhashi,et al.  Extractive Summarization using Continuous Vector Space Models , 2014, CVSC@EACL.

[58]  Berlin Chen,et al.  Leveraging Kullback–Leibler Divergence Measures and Information-Rich Cues for Speech Summarization , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[59]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[60]  Pascale Fung,et al.  Speech Summarization Without Lexical Features for Mandarin Broadcast News , 2007, NAACL.

[61]  Hsin-Min Wang,et al.  MATBN: A Mandarin Chinese Broadcast News Corpus , 2005, Int. J. Comput. Linguistics Chin. Lang. Process..

[62]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.