Research on Entities Matching across Heterogeneous Databases

Entities matching plays a crucial role in integrating multiple data sources. However, the inconsistent data formats and limited data sharing across heterogeneous databases become the bottlenecks for data interoperability and integration. So it is urgent to conduct research on entities heterogeneity and entities matching across multiple heterogeneous databases. After analyzing the limitations of present approaches for entities matching, we propose a decision model in which the normal definition of entities matching was presented to identify the corresponding entities. Based on the decision model, the detailed entities matching algorithm is proposed accordingly. In the experimental section, the precision and recall are employed to evaluate the performance for entities matching. The experimental results on real-world data indicate that our proposed approach is highly effective on entities matching.

[1]  Jaideep Srivastava,et al.  Mining Entity-Identification Rules for Database Integration , 1996, KDD.

[2]  A. R. Hurson,et al.  Linguistic support for semantic identification and interpretation in multidatabases , 1991, [1991] Proceedings. First International Workshop on Interoperability in Multidatabase Systems.

[3]  Bruce E. Barrett,et al.  Decision quality using ranked attribute weights , 1996 .

[4]  Arie Segev,et al.  Rule based joins in heterogeneous databases , 1995, Decis. Support Syst..

[5]  Kai-gui Wu,et al.  Similarity determination based on data types in heterogeneous databases using neural networks , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[6]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[7]  Ling Chen,et al.  A Novel Algorithm for Identifying Corresponding Attributes in Heterogeneous Databases , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  Ali R. Hurson,et al.  Automated resolution of semantic heterogeneity in multidatabases , 1994, TODS.

[10]  J B Copas,et al.  Record linkage: statistical models for matching computer records. , 1990, Journal of the Royal Statistical Society. Series A,.

[11]  Sumit Sarkar,et al.  A Distance-Based Approach to Entity Reconciliation in Heterogeneous Databases , 2002, IEEE Trans. Knowl. Data Eng..

[12]  Bao-Hua Qiang,et al.  Identifying Corresponding Entities Based on Attribute Entropy in Heterogeneous Databases , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.

[13]  Sumit Sarkar,et al.  A Probabilistic Decision Model for Entity Matching in Heterogeneous Databases , 1998 .

[14]  Kai-gui Wu,et al.  A data-type-based approach for identifying corresponding attributes in heterogeneous databases , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).