Type Prediction for Efficient Coreference Resolution in Heterogeneous Semantic Graphs

We describe an approach for performing entity type recognition in heterogeneous semantic graphs in order to reduce the computational cost of performing coreferenceresolution. Our research specifically addresses the problem of working with semi-structured text that uses ontologies that are not informative or not known. This problem is similar to co reference resolution in unstructured text, where entities and their types are identified using contextual information and linguistic-based analysis. Semantic graphs are semi-structured with very little contextual information and trivial grammars that do not convey additional information. In the absence of known ontologies, performing co reference resolution can be challenging. Our work uses a supervised machine learning algorithm and entity type dictionaries to map attributes to a common attribute space. We evaluated the approach in experiments using data from Wikipedia, Freebase and Arnetminer.

[1]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[2]  Stasha Ann Bown Larsen,et al.  Record Linkage , 2018, Encyclopedia of Database Systems.

[3]  David Yarowsky,et al.  Cross-Document Coreference Resolution: A Key Technology for Learning by Reading , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[4]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[5]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[6]  Hyoil Han,et al.  Survey of semantic annotation platforms , 2005, SAC '05.

[7]  Erhard Rahm,et al.  Comparison of Schema Matching Evaluations , 2002, Web, Web-Services, and Database Systems.

[8]  Dan Brickley,et al.  FOAF Vocabulary Specification , 2004 .

[9]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[10]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[11]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[12]  Mark Dredze,et al.  Streaming Cross Document Entity Coreference Resolution , 2010, COLING.

[13]  Sören Auer,et al.  The emerging web of linked data , 2011, ISWSA '11.

[14]  Alfio Ferrara,et al.  Towards a Benchmark for Instance Matching , 2008, OM.

[15]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[16]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[19]  Massimo Poesio,et al.  Disambiguation and Filtering Methods in Using Web Knowledge for Coreference Resolution , 2011, FLAIRS.

[20]  Anuj R. Jaiswal,et al.  Uninterpreted Schema Matching with Embedded Value Mapping under Opaque Column Names and Data Values , 2010, IEEE Transactions on Knowledge and Data Engineering.

[21]  Viviana Mascardi,et al.  Automatic Ontology Matching via Upper Ontologies: A Systematic Evaluation , 2010, IEEE Transactions on Knowledge and Data Engineering.

[22]  Umberto Straccia,et al.  Information retrieval and machine learning for probabilistic schema matching , 2005, CIKM '05.

[23]  Jan Hidders,et al.  SERIMI - resource description similarity, RDF instance matching and interlinking , 2011, OM.

[24]  Heiko Paulheim,et al.  Type Inference on Noisy RDF Data , 2013, SEMWEB.

[25]  Umberto Straccia,et al.  Information retrieval and machine learning for probabilistic schema matching , 2007, Inf. Process. Manag..

[26]  Anuj R. Jaiswal,et al.  OMEN: A Probabilistic Ontology Mapping Tool , 2005, SEMWEB.

[27]  Enrico Motta,et al.  Overcoming Schema Heterogeneity between Linked Semantic Repositories to Improve Coreference Resolution , 2009, ASWC.

[28]  Jie Tang,et al.  Social Network Extraction of Academic Researchers , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[29]  Balakrishnan Chandrasekaran,et al.  What are ontologies, and why do we need them? , 1999, IEEE Intell. Syst..

[30]  Masaki Aono,et al.  Ontology instance matching by considering semantic link cloud , 2010 .

[31]  Timothy W. Finin,et al.  Computing FOAF Co-reference Relations with Rules and Machine Learning , 2010 .

[32]  Timothy W. Finin,et al.  A Machine Learning Approach to Linking FOAF Instances , 2010, AAAI Spring Symposium: Linked Data Meets Artificial Intelligence.

[33]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[34]  Enrico Motta,et al.  Data linking: capturing and utilising implicit schema-level relations , 2010, LDOW.

[35]  Andrew McCallum,et al.  Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models , 2011, ACL.

[36]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[37]  Jeff Heflin,et al.  Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach , 2011, SEMWEB.

[38]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[39]  Solomon Eyal Shimony,et al.  Markov Network Based Ontology Matching , 2009, IJCAI.

[40]  Amihai Motro,et al.  Database Schema Matching Using Machine Learning with Feature Selection , 2002, CAiSE.

[41]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..