论文信息 - Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features - 字舞流文

Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features

This paper proposes a set of approaches to automatically extract key terms from spoken course lectures including audio signals, ASR transcriptions and slides. We divide the key terms into two types: key phrases and keywords and develop different approaches to extract them in order. We extract key phrases using right/left branching entropy and extract keywords by learning from three sets of features: prosodic features, lexical features and semantic features from Probabilistic Latent Semantic Analysis (PLSA). The learning approaches include an unsupervised method (K-means exemplar) and two supervised ones (AdaBoost and neural network). Very encouraging preliminary results were obtained with a corpus of course lectures, and it is found that all approaches and all sets of features proposed here are useful.

Yu Huang | Lin-Shan Lee | Yun-Nung Chen | Sheng-yi Kong | Sheng-yi Kong | Lin-Shan Lee | Yun-Nung (Vivian) Chen | Yu Huang

[1] Lin-Shan Lee,et al. IMPROVED SUMMARIZATION OF CHINESE SPOKEN DOCUMENTS BY PROBABILISTIC LATENT SEMANTIC ANALYSIS (PLSA) WITH FURTHER ANALYSIS AND INTEGRATED SCORING , 2006, 2006 IEEE Spoken Language Technology Workshop.

[2] Donald E. Knuth,et al. The art of computer programming: sorting and searching (volume 3) , 1973 .

[3] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .

[4] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[5] Lin-Shan Lee,et al. Improved Spoken Document Summarization Using Probabilistic Latent Semantic Analysis (PLSA) , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6] Lin-Shan Lee,et al. Learning on demand - course lecture distillation by information extraction and semantic structuring for spoken documents , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] Feifan Liu,et al. Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts , 2009, NAACL.

[8] Anette Hulth,et al. Automatic Keyword Extraction Using Domain Knowledge , 2001, CICLing.

[9] Helmut Schmidt,et al. Probabilistic part-of-speech tagging using decision trees , 1994 .

[10] Thomas Hofmann,et al. Probabilistic Latent Semantic Analysis , 1999, UAI.

[11] Fei Liu,et al. Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion , 2008, 2008 IEEE Spoken Language Technology Workshop.

[12] Donald E. Knuth,et al. The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[13] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation (3rd Edition) , 2007 .

[14] Yaakov HaCohen-Kerner,et al. Automatic Extraction and Learning of Keyphrases from Scientific Articles , 2005, CICLing.

[15] Peter D. Turney. Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[16] Julia Hirschberg,et al. Communication and prosody: Functional aspects of prosody , 2002, Speech Commun..

[17] Hsinchun Chen,et al. Updateable PAT-Tree Approach to Chinese Key PhraseExtraction using Mutual Information: A Linguistic Foundation for Knowledge Management , 1999 .

[18] Zhiyuan Liu,et al. Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[19] Pascale Fung,et al. Improving lecture speech summarization using rhetorical information , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[20] Donald R. Morrison,et al. PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.