论文信息 - Unsupervised Hidden Topic Framework for Extracting Keywords (Synonym, Homonym, Hyponymy and Polysemy) and Topics in Meeting Transcripts

Unsupervised Hidden Topic Framework for Extracting Keywords (Synonym, Homonym, Hyponymy and Polysemy) and Topics in Meeting Transcripts

Keyword is the important item in the document that provides efficient access to the content of a document. It can be used to search for information or to decide whether to read a document. This paper mainly focuses on extracting hidden topics from meeting transcripts. Existing system is handled with web documents, but this proposed framework focuses on solving Synonym, Homonym, Hyponymy and Polysemy problems in meeting transcripts. Synonym problem means different words having similar meaning are grouped and single keyword is extracted. Hyponymy problem means one word denoting subclass is considered and super class keyword is extracted. Homonym means a word can have two or more different meanings. For example, Left might appear in two different contexts: Car left (past tense of leave) and Left side (Opposite of right). A polysemy means word with different, but related senses. For example, count has different related meanings: to say number in right order, to calculate. Hidden topics from meeting transcripts can be found using LDA model. Finally MaxEnt classifier is used for extracting keywords and topics which will be used for information retrieval.

J. I. Sheeba | K. Vivekanandan | G. Sabitha | P. Padmavathi

[1] Susan T. Dumais,et al. Similarity Measures for Short Segments of Text , 2007, ECIR.

[2] Mehran Sahami,et al. A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[3] Thomas Hofmann,et al. Text categorization by boosting automatically extracted concepts , 2003, SIGIR.

[4] Inderjit S. Dhillon,et al. Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[5] Andrew McCallum,et al. Distributional clustering of words for text classification , 1998, SIGIR '98.

[6] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[7] Christopher Meek,et al. Improving Similarity Measures for Short Segments of Text , 2007, AAAI.

[8] Susumu Horiguchi,et al. A Hidden Topic-Based Framework toward Building Applications with Short Web Documents , 2011, IEEE Transactions on Knowledge and Data Engineering.

[9] Michael W. Berry,et al. Large-Scale Information Retrieval with Latent Semantic Indexing , 1997, Inf. Sci..

[10] Feifan Liu,et al. Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts , 2009, NAACL.

[11] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[12] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[13] Ran El-Yaniv,et al. Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..