暂无分享,去创建一个
Different ways of entering data into databases result in duplicate records that cause increasing of databases' size. This is a fact that we cannot ignore it easily. There are several methods that are used for this purpose. In this paper, we have tried to increase the accuracy of operations by using cluster similarity instead of direct similarity of fields. So that clustering is done on fields of database and according to accomplished clustering on fields, similarity degree of records is obtained. In this method by using present information in database, more logical similarity is obtained for deficient information that in general, the method of cluster similarity could improve operations 24% compared with previous methods.
[1] Carlos Alberto Heuser,et al. Automatic threshold estimation for data matching applications , 2008, Inf. Sci..
[2] Federico Maggi Cycle. A Survey of Probabilistic Record Matching Models , Techniques and Tools , 2008 .
[3] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.
[4] Sushmita Mitra,et al. Clustering of Symbolic Data and Its Validation , 2002, AFSS.