论文信息 - Automatic Summarization of Highly Spontaneous Speech

Automatic Summarization of Highly Spontaneous Speech

This paper addresses speech summarization of highly spontaneous speech. Speech is converted into text using an ASR, then segmented into tokens. Human made and automatic, prosody based tokenization are compared. The obtained sentence-like units are analysed by a syntactic parser to help automatic sentence selection for the summary. The preprocessed sentences are ranked based on thematic terms and sentence position. The thematic term is expressed in two ways: TF-IDF and Latent Semantic Indexing. The sentence score is calculated as linear combination of the thematic term score and a sentence position score. To generate the summary, the top 10 candidates for the most informative/best summarizing sentences are selected. The system performance showed comparable results (recall: 0.62, precision: 0.79 and F-measure 0.68) with the prosody based tokenization approach. A subjective test is also carried out on a Likert scale.

András Beke | György Szaszák

[1] András Beke,et al. Exploiting Prosody for Automatic Syntactic Phrase Boundary Detection in Speech , 2012 .

[2] Kamal Sarkar,et al. Bengali text summarization by sentence extraction , 2012, ArXiv.

[3] Julia Hirschberg,et al. Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization , 2005, INTERSPEECH.

[4] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[5] András Beke,et al. Development of a Large Spontaneous Speech Database of Agglutinative Hungarian Language , 2014, TSD.

[6] Máté Szarvas,et al. Automatic Recognition of Hungarian: Theory And Practice , 2000, Int. J. Speech Technol..

[7] András Beke,et al. Exploiting Prosody for Syntactic Analysis in Automatic Speech Understanding , 2012, J. Lang. Model..

[8] Peter W. Foltz,et al. An introduction to latent semantic analysis , 1998 .

[9] Karel Jezek,et al. Comparing Semantic Models for Evaluating Automatic Document Summarization , 2015, TSD.

[10] Yang Liu,et al. Impact of automatic sentence segmentation on meeting summarization , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11] Veronika Vincze,et al. magyarlanc: A Tool for Morphological and Dependency Parsing of Hungarian , 2013, RANLP.

[12] Ani Nenkova,et al. Summarization evaluation for text and speech: issues and approaches , 2006, INTERSPEECH.