论文信息 - A novel method for data conflict resolution using multiple rules

A novel method for data conflict resolution using multiple rules

In data integration, data conflict resolution is the crucial issue which is closely correlated with the quality of integrated data. Current research focuses on resolving data conflict on single attribute, which does not consider not only the conflict degree of different attributes but also the interrelationship of data conflict resolution on different attributes, and it can reduce the accuracy of resolution results. This paper proposes a novel two-stage data conflict resolution based on Markov Logic Networks. Our approach can divide attributes according to their conflict degree, then resolves data conflicts in the following two steps: (1)For the week conflicting attributes, we exploit a few common rules to resolve data conflicts, such rules as voting and mutual implication between facts. (2)Then, we resolve the strong conflicting attributes based on results from the first step. In this step, additional rules are added in rules set, such rules as inter-dependency between sources and facts, mutual dependency between sources and the influence of week conflicting attributes to strong conflicting attributes. Experimental results using a large number of real-world data collected from two domains show that the proposed approach can significantly improve the accuracy of data conflict resolution.

Peng Zhao-Hui | Li Qingzhong | Zhang Yong-Xin

[1] Pedro M. Domingos,et al. Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[2] Anne Lohrli. Chapman and Hall , 1985 .

[3] Felix Naumann,et al. FuSem - Exploring Different Semantics of Data Fusion , 2007, VLDB.

[4] Felix Naumann,et al. Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[5] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[6] Pedro M. Domingos,et al. Efficient Weight Learning for Markov Logic Networks , 2007, PKDD.

[7] Felix Naumann,et al. Automatic Data Fusion with HumMer , 2005, VLDB.

[8] Pedro M. Domingos,et al. Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[9] Dan Suciu,et al. Data conflict resolution using trust mappings , 2010, SIGMOD Conference.

[10] Pedro M. Domingos,et al. Discriminative Training of Markov Logic Networks , 2005, AAAI.

[11] Michael R. Genesereth,et al. Logical foundations of artificial intelligence , 1987 .