An Efficient Metric of Automatic Weight Generation for Properties in Instance Matching Technique

The proliferation of heterogeneous data sources of semantic knowledge base intensifies the need of an automatic instance matching technique. However, the efficiency of instance matching is often influenced by the weight of a property associated to instances. Automatic weight generation is a non-trivial, however an important task in instance matching technique. Therefore, identifying an appropriate metric for generating weight for a property automatically is nevertheless a formidable task. In this paper, we investigate an approach of generating weights automatically by considering hypotheses: (1) the weight of a property is directly proportional to the ratio of the number of its distinct values to the number of instances contain the property, and (2) the weight is also proportional to the ratio of the number of distinct values of a property to the number of instances in a training dataset. The basic intuition behind the use of our approach is the classical theory of information content that infrequent words are more informative than frequent ones. Our mathematical model derives a metric for generating property weights automatically, which is applied in instance matching system to produce re-conciliated instances efficiently. Our experiments and evaluations show the effectiveness of our proposed metric of automatic weight generation for properties in an instance matching technique.

[1]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[2]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[3]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[4]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[5]  Marc Ehrig,et al.  Ontology Alignment: Bridging the Semantic Gap , 2006 .

[6]  Masaki Aono,et al.  Ontology instance matching by considering semantic link cloud , 2010 .

[7]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative , 2007 .

[8]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[9]  Jan Nößner,et al.  CODI: Combinatorial Optimization for Data Integration: results for OAEI 2011 , 2010, OM.

[10]  Silvana Castano,et al.  Instance Matching for Ontology Population , 2008, SEBD.

[11]  Lifang Gu,et al.  Record Linkage: Current Practice and Future Directions , 2003 .

[12]  Mansur R. Kabuka,et al.  ASMOV: results for OAEI 2010 , 2010, OM.

[13]  Tim Berners-Lee,et al.  Linked data on the web (LDOW2008) , 2008, WWW.

[14]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[15]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[16]  Masaki Aono,et al.  Metric of intrinsic information content for measuring semantic similarity in an ontology , 2010, APCCM.

[17]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[18]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[19]  Yue Zhao,et al.  RiMOM results for OAEI 2016 , 2010, OM@ISWC.

[20]  Masaki Aono,et al.  An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size , 2009, J. Web Semant..

[21]  S. Handschuh,et al.  Discovering Semantic Equivalence of People behind Online Profiles , 2012 .

[22]  Masaki Aono,et al.  Alignment Results of Anchor-Flood Algorithm for OAEI-2008 , 2008, OM.

[23]  Masaki Aono,et al.  A novel automatic property weight generator for semantic data integration , 2014, 16th Int'l Conf. Computer and Information Technology.

[24]  Masaki Aono,et al.  Anchor-Flood: Results for OAEI 2009 , 2009, OM.

[25]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[26]  Jan Hidders,et al.  SERIMI - resource description similarity, RDF instance matching and interlinking , 2011, OM.

[27]  Sudipto Guha,et al.  Merging the Results of Approximate Match Operations , 2004, VLDB.

[28]  Masaki Aono,et al.  Augmentation of ontology instance matching by automatic weight generation , 2011, 2011 World Congress on Information and Communication Technologies.

[29]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[30]  Masaki Aono,et al.  An Efficient Method for Ontology Instance Matching , 2012 .

[31]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .