论文信息 - Entity Resolution in Texts Using Statistical Learning and Ontologies - 字舞流文

Entity Resolution in Texts Using Statistical Learning and Ontologies

Ambiguities, which are inherently present in natural languages represent a challenge of determining the actual identities of entities mentioned in a document (e.g., Paris can refer to a city in France but it can also refer to a small city in Texas, USA or to a 1984 film directed by Wim Wenders having title Paris, Texas). Disambiguation is a problem that can be successfully solved by entity resolution methods. This paper studies various methods for estimating relatedness between entities, used in collective entity resolution. We define a unified entity resolution approach, capable of using implicit as well as explicit relatedness for collectively identifying in-text entities. As a relatedness measure, we propose a method, which expresses relatedness using the heterogeneous relations of a domain ontology. We also experiment with other relatedness measures, such as using statistical learning of co-occurrences of two entities or using content similarity between them. Evaluation on real data shows that the new methods for relatedness estimation give good results.

Dunja Mladenic | Tadej Stajner | Tadej Štajner | D. Mladenić

[1] Hang Li,et al. Word Clustering and Disambiguation Based on Co-occurrence Data , 1998, COLING.

[2] Dan Roth,et al. Semantic Integration in Text: From Ambiguous Names to Identifiable Entities , 2005, AI Mag..

[3] Dunja Mladenic,et al. Text Mining-Machine Learning on Documents , 2005 .

[4] Jeffrey M. Bradshaw,et al. Applying KAoS Services to Ensure Policy Compliance for Semantic Web Services Workflow Composition and Enactment , 2004, SEMWEB.

[5] Andrew McCallum,et al. Information Extraction , 2005, ACM Queue.

[6] Alex E. Bell. UML Fever: Diagnosis and Recovery , 2005, ACM Queue.

[7] Razvan C. Bunescu,et al. Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[8] William E. Winkler,et al. The State of Record Linkage and Current Research Problems , 1999 .

[9] Silviu Cucerzan,et al. Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[10] Daniel Gruhl,et al. Disambiguation of References to Individuals , 2005 .

[11] Ian Horrocks,et al. The Semantic Web: The Roles of XML and RDF , 2000, IEEE Internet Comput..

[12] Laura M. Haas,et al. Transforming Heterogeneous Data with Database Middleware: Beyond Integration , 1999, IEEE Data Eng. Bull..

[13] Dmitri V. Kalashnikov,et al. A probabilistic model for entity disambiguation using relationships , 2004 .

[14] Dmitri V. Kalashnikov,et al. Adaptive graphical approach to entity resolution , 2007, JCDL '07.

[15] Pradeep Ravikumar,et al. A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[16] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[17] Jens Lehmann,et al. DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[18] Gerhard Weikum,et al. SOFIE: a self-organizing framework for information extraction , 2009, WWW '09.

[19] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[20] Craig A. Knoblock,et al. Learning object identification rules for information integration , 2001, Inf. Syst..

[21] Ivan P. Fellegi,et al. A Theory for Record Linkage , 1969 .

[22] Sung-Hyon Myaeng,et al. Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting , 1999, ACL.

[23] Lise Getoor,et al. Collective entity resolution in relational data , 2007, TKDD.

[24] Jeremy J. Carroll,et al. Resource description framework (rdf) concepts and abstract syntax , 2003 .

[25] Hang Li,et al. Word Clustering and Disambiguation Based on Co-occurrence Data , 1998, COLING.

[26] Gerhard Weikum,et al. WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[27] Stefan M. Rüger,et al. Place Disambiguation with Co-occurrence Models , 2006, CLEF.

[28] Rada Mihalcea,et al. Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling , 2005, HLT.

[29] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[30] Oren Etzioni,et al. Unsupervised Resolution of Objects and Relations on the Web , 2007, NAACL.

[31] Amit P. Sheth,et al. Discovering informative connection subgraphs in multi-relational graphs , 2005, SKDD.

[32] Hinrich Schütze,et al. Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[33] Dunja Mladenic,et al. Visualization of Text Document Corpus , 2005, Informatica.

[34] Pedro M. Domingos,et al. Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[35] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[36] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[37] Razvan C. Bunescu,et al. Integrating Co-occurrence Statistics with Information Extraction for Robust Retrieval of Protein Interactions from Medline , 2006, BioNLP@NAACL-HLT.