Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts

This paper explores several unsupervised approaches to automatic keyword extraction using meeting transcripts. In the TFIDF (term frequency, inverse document frequency) weighting framework, we incorporated part-of-speech (POS) information, word clustering, and sentence salience score. We also evaluated a graph-based approach that measures the importance of a word based on its connection with other sentences or words. The system performance is evaluated in different ways, including comparison to human annotated keywords using F-measure and a weighted score relative to the oracle system performance, as well as a novel alternative human evaluation. Our results have shown that the simple unsupervised TFIDF approach performs reasonably well, and the additional information from POS and sentence score helps keyword extraction. However, the graph method is less effective for this domain. Experiments were also performed using speech recognition output and we observed degradation and different patterns compared to human transcripts.

[1]  Andreas Stolcke,et al.  Using MLP features in SRI's conversational speech recognition system , 2005, INTERSPEECH.

[2]  Chung-Hsien Wu,et al.  Speech retrieval using spoken keyword extraction and semantic verification , 2007, TENCON 2007 - 2007 IEEE Region 10 Conference.

[3]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[5]  Peter D. Turney Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data , 2002, ArXiv.

[6]  Thomas Schaaf,et al.  Lecture and presentation tracking in an intelligent meeting room , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[9]  Johanna D. Moore,et al.  Evaluating Automatic Summaries of Meeting Recordings , 2005, IEEvaluation@ACL.

[10]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[11]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[12]  Yaakov HaCohen-Kerner,et al.  Automatic Extraction and Learning of Keyphrases from Scientific Articles , 2005, CICLing.

[13]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[14]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[15]  Xiaojun Wan,et al.  Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction , 2007, ACL.

[16]  Diana Inkpen,et al.  Extracting semantically-coherent keyphrases from speech , 2004 .

[17]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[18]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[19]  Gerald Penn,et al.  Web-based language modelling for automatic lecture transcription , 2007, INTERSPEECH.

[20]  Fei Liu,et al.  Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion , 2008, 2008 IEEE Spoken Language Technology Workshop.

[21]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[22]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[23]  Lonneke van der Plas,et al.  Automatic Keyword Extraction from Spoken Text. A Comparison of Two Lexical Resources: EDR and WordNet , 2004, LREC.

[24]  Andreas Stolcke,et al.  Web resources for language modeling in conversational speech recognition , 2007, TSLP.

[25]  Dragomir R. Radev,et al.  Experiments in Single and Multi-Document Summarization Using MEAD , 2001 .

[26]  Joel D. Martin,et al.  Extracting Keyphrases from Spoken Audio Documents , 2001, SIGIR Workshop: Information Retrieval Techniques for Speech Applications.