Predicting Object Types in Linked Data by Text Classification

Type information of objects is very valuable in linked data. However, many linked data are incomplete in type information. Traditional research of type inference is able to find missing types by means of reasoning, but it may become invalid in those data with incomplete or incorrect schema. In this paper, we propose a text-classification approach to type prediction in linked data. An Object Graph is proposed as the data model. A Virtual Document of Type Information is constructed for each object, and two strategies are proposed to characterize the inductiveness of different part of virtual document for type prediction. Two classifiers are trained to categorize each object into a set of types. Experiments validate that type prediction by text classification is feasible and well-performed.

[1]  Andreas Harth,et al.  The truth is rarely pure and never simple . ” – , 2013 .

[2]  Timothy W. Finin,et al.  Type Prediction for Efficient Coreference Resolution in Heterogeneous Semantic Graphs , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[3]  Andrea Giovanni Nuzzolese,et al.  Automatic Typing of DBpedia Entities , 2012, SEMWEB.

[4]  Heiko Paulheim,et al.  Improving the Quality of Linked Data Using Statistical Distributions , 2014, Int. J. Semantic Web Inf. Syst..

[5]  Timothy W. Finin,et al.  Entity Type Recognition for Heterogeneous Semantic Graphs , 2013, AI Mag..

[6]  Claudio Giuliano,et al.  Automatic Expansion of DBpedia Exploiting Wikipedia Cross-Language Information , 2013, ESWC.

[7]  Andrea Giovanni Nuzzolese,et al.  Type inference through the analysis of Wikipedia links , 2012, LDOW.

[8]  Eyal Oren,et al.  Simple Algorithms for Predicate Suggestions Using Similarity and Co-occurrence , 2007, ESWC.

[9]  Aleksander Pohl Classifying the Wikipedia Articles into the OpenCyc Taxonomy , 2012, WoLE@ISWC.

[10]  Yuzhong Qu,et al.  Constructing virtual documents for ontology matching , 2006, WWW '06.

[11]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[12]  Johanna Völker,et al.  Statistical Schema Induction , 2011, ESWC.

[13]  Heiko Paulheim,et al.  Type Inference on Noisy RDF Data , 2013, SEMWEB.

[14]  Maribel Acosta,et al.  Crowdsourcing Linked Data Quality Assessment , 2013, SEMWEB.

[15]  Heiko Paulheim Browsing Linked Open Data with Auto Complete , 2012 .