论文信息 - An Exploration of Entity Models, Collective Classification and Relation Description - 字舞流文

An Exploration of Entity Models, Collective Classification and Relation Description

Traditional information retrieval typically represents data using a bag of words; data mining typically uses a highly structured database representation. This paper explores the middle ground using a representation which we term entity models, in which questions about structured data may be posed and answered, but the complexities and task-specific restrictions of ontologies are avoided. An entity model is a language model or word distribution associated with an entity, such as a person, place or organization. Using these perentity language models, entities may be clustered, links may be detected or described with a short summary, entities may be collectively classified, and question answering may be performed. On a corpus of entities extracted from newswire and the Web, we group entities by profession with 90% accuracy, improve accuracy further on the task of classifying politicians as liberal or conservative using collective classification and conditional random fields, and answer questions about “who a person is” with mean reciprocal rank (MRR) of 0.52.

James Allan | Andrew McCallum | Hema Raghavan | A. McCallum | James Allan | Hema Raghavan

[1] Stanley F. Chen,et al. Evaluation Metrics For Language Models , 1998 .

[2] Marti A. Hearst. Untangling Text Data Mining , 1999, ACL.

[3] Jennifer Neville,et al. Learning relational probability trees , 2003, KDD '03.

[4] W. Bruce Croft,et al. Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[5] Ellen M. Voorhees,et al. The Eighth Text REtrieval Conference (TREC-8) , 2000 .

[6] Jochen Dörre,et al. Text mining: finding nuggets in mountains of textual data , 1999, KDD '99.

[7] Michael Collins,et al. Answer Extraction , 2000, ANLP.

[8] Richard M. Schwartz,et al. An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[9] Jack G. Conrad,et al. A system for discovering relationships by feature extraction from text databases , 1994, SIGIR '94.

[10] J. Laurie Snell,et al. Markov Random Fields and Their Applications , 1980 .

[11] David D. Lewis,et al. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[12] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[13] George Karypis,et al. Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.

[14] Donald Geman,et al. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[15] Yael Ravin,et al. Identifying and extracting relations from text , 1999 .

[16] W. Bruce Croft,et al. Predicting query performance , 2002, SIGIR '02.

[17] Matthew Richardson,et al. Mining the network value of customers , 2001, KDD '01.

[18] Ben Taskar,et al. Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[19] Einat Amitay,et al. Using common hypertext links to identify the best phrasal description of target web documents , 1998 .

[20] Michael I. Jordan,et al. Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[21] Beth Sundheim. Third Message Understanding Evaluation and Conference (MUC-3): Phase 1 Status Report , 1991, HLT.