论文信息 - Extraction of Key Words from News Stories

Extraction of Key Words from News Stories

Abstract : In this work, we consider the task of extracting key-words such as key-players, key-locations, key-nouns and key-verbs from news stories. We cast this problem as a classification problem wherein we assign appropriate labels to each word in a news story. We considered statistical models such as naive Bayes model, hidden Markov model and maximum entropy model in our work. We have also experimented with various features. Our results indicate that a maximum entropy model that ignores contextual features and considers only word-based features combined with stopping and stemming yields the best performance. We found that extraction of keyverbs and key-nouns is a much harder problem than extracting keyplayers and key-locations.

James Allan | Ramesh Nallapati | Sridhar Mahadevan

[1] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3] Richard M. Schwartz,et al. Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[4] Wendy G. Lehnert,et al. Wrap-Up: a Trainable Discourse Module for Information Extraction , 1994, J. Artif. Intell. Res..

[5] Richard K. Belew,et al. Exporting phrases: a statistical analysis of topical language , 1991 .

[6] Andrew McCallum,et al. Using Maximum Entropy for Text Classification , 1999 .

[7] J. Movellan. Tutorial on Hidden Markov Models , 2006 .

[8] Bruce Krulwich,et al. Learning user information interests through extraction of semantically significant phrases , 1996 .

[9] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[10] Ralph Grishman,et al. A Corpus-based Probabilistic Grammar with Only Two Non-terminals , 1995, IWPT.

[11] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[12] James Allan,et al. Topic detection and tracking: event-based information organization , 2002 .