Automatic Summarization

This paper proposes an automatic speech summarization technique for English. In our proposed method, a set of words maximizing a summarization score indicating appropriateness of summarization is extracted from automatically transcribed speech and concatenated to create a summary. The extraction process is performed using a Dynamic Programming (DP) technique according to a target compression ratio. In this paper, English broadcast news speech transcribed using a speech recognizer is automatically summarized. In order to apply our method, originally proposed for Japanese, to English, the model of estimating word concatenation probabilities based on a dependency structure in the original speech given by a Stochastic Dependency Context Free Grammar (SDCFG) is modified. A summarization method for multiple utterances using twolevel DP technique is also proposed. The automatically summarized sentences are evaluated by a summarization accuracy based on the comparison with the manual summarization of correctly transcribed speech by human subjects. Experimental results show that our proposed method effectively extracts relatively important information and remove redundant and irrelevant information from English news speech.

[1]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2]  W. D. Climenson,et al.  Automatic syntax analysis in machine indexing and abstracting , 1961 .

[3]  E. F. Skorochod'ko Adaptive Method of Automatic Abstracting and Indexing , 1971, IFIP Congress.

[4]  James E. Rush,et al.  Automatic abstracting and indexing. II. Production of indicative abstracts by application of contextual inference and syntactic coherence criteria , 1971 .

[5]  James E. Rush,et al.  Improvement of automatic abstracts by the use of structural analysis , 1973, J. Am. Soc. Inf. Sci..

[6]  Walter Kintsch,et al.  Toward a model of text comprehension and production. , 1978 .

[7]  V. Dijk Recalling and Summarizing Complex Discourse , 1979 .

[8]  Alfred Correira,et al.  Computing Story Trees , 1980, CL.

[9]  Edward T. Cremmins The Art of Abstracting. , 1982 .

[10]  Giovanni Guida,et al.  Evaluating Importance: A Step Towards Text Summarization , 1985, IJCAI.

[11]  Elizabeth DuRoss Liddy,et al.  The discourse-level structure of natural language texts : an exploratory study of empirical abstracts , 1988 .

[12]  Lisa F. Rau,et al.  Information extraction and text summarization using linguistic knowledge acquisition , 1989, Inf. Process. Manag..

[13]  Danny Kopec,et al.  Additional References , 2003 .

[14]  Thomas Schaaf,et al.  Estimating confidence using word lattices , 1997, EUROSPEECH.

[15]  Daniel Marcu The rhetorical parsing of natural language texts , 1997 .

[16]  M. Aretoulaki Towards a Hybrid Abstract Generation System 3 the Need for a Hybrid System 3.1 Previous Work on Connectionist Nlp Symbolic Ann-based Content Selector Encoder List of Important Sentences Morphological Analyser Syntactic Analyser Lexicon Semantic Analyser Pragmatic Analyser , 1997 .

[17]  Sergei Nirenburg,et al.  MINDS - Multi-lingual INteractive Document Summarization , 1998 .

[18]  Gregory Grefenstette Producing Intelligent Telegraphic Text Reduction to provide an Audio Scanning Service for the Blind , 1998 .

[19]  D. Marcu,et al.  Experiments in Constructing a Corpus of Discourse Trees : Problems , Annotation Choices , Issues , 1999 .

[20]  Daniel Marcu,et al.  A Decision-Based Approach to Rhetorical Parsing , 1999, ACL.

[21]  Robin Valenza SUMMARISATION OF SPOKEN AUDIO THROUGH INFORMATION EXTRACTION , 1999 .

[22]  Kathleen McKeown,et al.  The decomposition of human-written summary sentences , 1999, SIGIR '99.

[23]  Daniel Marcu,et al.  The automatic construction of large-scale corpora for summarization research , 1999, SIGIR '99.

[24]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[25]  Hongyan Jing,et al.  Sentence Reduction for Automatic Text Summarization , 2000, ANLP.

[26]  Akinori Ito,et al.  Language modeling by stochastic dependency grammar for Japanese speech recognition , 2000, Systems and Computers in Japan.

[27]  Sadaoki Furui,et al.  Automatic speech summarization based on word significance and linguistic likelihood , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[28]  Sadaoki Furui,et al.  Improvements in automatic speech summarization and evaluation methods , 2000, INTERSPEECH.

[29]  Hitoshi Isahara,et al.  Toward the realization of spontaneous speech recognition - introduction of a Japanese priority program and preliminary results - , 2000, INTERSPEECH.

[30]  Shoei Sato,et al.  Progressive 2-pass decoder for real-time broadcast news captioning , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[31]  Hagen Soltau,et al.  Advances in meeting recognition , 2001, HLT.

[32]  Klaus Zechner,et al.  Automatic generation of concise summaries of spoken dialogues in unrestricted domains , 2001, SIGIR '01.

[33]  Sadaoki Furui,et al.  Advances in automatic speech summarization , 2001, INTERSPEECH.

[34]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[35]  Kathleen R. McKeown Generating Patient-Specific Summaries of Online Literature , 2002 .

[36]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[37]  Elizabeth D. Liddy,et al.  Advances in Automatic Text Summarization , 2001, Information Retrieval.