论文信息 - Measuring Term Informativeness in Context - 字舞流文

Measuring Term Informativeness in Context

Measuring term informativeness is a fundamental NLP task. Existing methods, mostly based on statistical information in corpora, do not actually measure informativeness of a term with regard to its semantic context. This paper proposes a new lightweight feature-free approach to encode term informativeness in context by leveraging web knowledge. Given a term and its context, we model contextaware term informativeness based on semantic similarity between the context and the term’s most featured context in a knowledge base, Wikipedia. We apply our method to three applications: core term extraction from snippets (text segment), scientific keywords extraction (paper), and back-of-the-book index generation (book). The performance is state-of-theart or close to it for each application, demonstrating its effectiveness and generality.

Zhaohui Wu | C. Lee Giles | Zhaohui Wu

[1] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[2] Elena Maceviciute,et al. Review of : Choo, C.W. Information management for the intelligent organization: the art of scanning the environment. 3rd ed. Medford, NJ: Information Today, Inc., 2002 , 2003 .

[3] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.

[4] Kenneth Ward Church,et al. Inverse Document Frequency (IDF): A Measure of Deviations from Poisson , 1995, VLC@ACL.

[5] Daniel Marcu,et al. Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[6] Daniel Kifer,et al. Context-aware citation recommendation , 2010, WWW '10.

[7] Le Zhao,et al. Term necessity prediction , 2010, CIKM.

[8] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[9] Eneko Agirre,et al. A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[10] Peter D. Turney. Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[11] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[12] Timothy Baldwin,et al. SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[13] Rada Mihalcea,et al. Investigations in Unsupervised Back-of-the-Book Indexing , 2007, FLAIRS.

[14] Rada Mihalcea,et al. Creating a Testbed for the Evaluation of Automatically Generated Back-of-the-Book Indexes , 2006, CICLing.

[15] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[16] Anette Hulth,et al. Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[17] RobertsonStephen,et al. Karen Sprck Jones , 2008 .

[18] Rada Mihalcea,et al. Linguistically Motivated Features for Enhanced Back-of-the-Book Indexing , 2008, ACL.

[19] Ian H. Witten,et al. Domain-independent automatic keyphrase indexing with small training sets , 2008 .

[20] Matthew Hurst,et al. A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[21] Tommi S. Jaakkola,et al. Using term informativeness for named entity detection , 2005, SIGIR '05.

[22] Andrei Popescu-Belis,et al. Computing text semantic relatedness using the contents and links of a hypertext encyclopedia , 2013, Artif. Intell..

[23] Zhiyuan Liu,et al. Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[24] Paul M. B. Vitányi,et al. The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[25] Kishore Papineni,et al. Why Inverse Document Frequency? , 2001, NAACL.

[26] Virgil Diodato,et al. Back of book indexes and the characteristics of author and nonauthor indexing: Report of an exploratory study , 1991, J. Am. Soc. Inf. Sci..

[27] Myeong-Kwan Kevin Cheon,et al. Frank and I , 2012 .

[28] Charles L. A. Clarke,et al. Frequency Estimates for Statistical Word Similarity Measures , 2003, NAACL.

[29] Alistair Moffat,et al. Exploring the similarity space , 1998, SIGF.

[30] Kirill Kireyev,et al. Semantic-based Estimation of Term Informativeness , 2009, NAACL.

[31] Laurent Romary,et al. HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID , 2010, *SEMEVAL.

[32] Grace Hui Yang,et al. A Metric-based Framework for Automatic Taxonomy Induction , 2009, ACL.

[33] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[34] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[35] Virgil Diodato,et al. User Preferences for Features in Back of Book Indexes , 1994, Journal of the American Society for Information Science.

[36] Don R. Swanson,et al. Probabilistic models for automatic indexing , 1974, J. Am. Soc. Inf. Sci..

[37] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[38] Ian H. Witten,et al. Topic indexing with Wikipedia , 2008 .

[39] Graeme Hirst,et al. Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[40] Ian H. Witten,et al. Human-competitive tagging using automatic keyphrase extraction , 2009, EMNLP.

[41] Karen Spärck Jones. Index term weighting , 1973, Inf. Storage Retr..

[42] Carl Gutwin,et al. KEA: practical automatic keyphrase extraction , 1999, DL '99.

[43] Carl Gutwin,et al. Domain-Specific Keyphrase Extraction , 1999, IJCAI.