A novel method for data conflict resolution using multiple rules

In data integration, data conflict resolution is the crucial issue which is closely correlated with the quality of integrated data. Current research focuses on resolving data conflict on single attribute, which does not consider not only the conflict degree of different attributes but also the interrelationship of data conflict resolution on different attributes, and it can reduce the accuracy of resolution results. This paper proposes a novel two-stage data conflict resolution based on Markov Logic Networks. Our approach can divide attributes according to their conflict degree, then resolves data conflicts in the following two steps: (1)For the week conflicting attributes, we exploit a few common rules to resolve data conflicts, such rules as voting and mutual implication between facts. (2)Then, we resolve the strong conflicting attributes based on results from the first step. In this step, additional rules are added in rules set, such rules as inter-dependency between sources and facts, mutual dependency between sources and the influence of week conflicting attributes to strong conflicting attributes. Experimental results using a large number of real-world data collected from two domains show that the proposed approach can significantly improve the accuracy of data conflict resolution.

[1]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[2]  Anne Lohrli Chapman and Hall , 1985 .

[3]  Felix Naumann,et al.  FuSem - Exploring Different Semantics of Data Fusion , 2007, VLDB.

[4]  Felix Naumann,et al.  Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[5]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[6]  Pedro M. Domingos,et al.  Efficient Weight Learning for Markov Logic Networks , 2007, PKDD.

[7]  Felix Naumann,et al.  Automatic Data Fusion with HumMer , 2005, VLDB.

[8]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[9]  Dan Suciu,et al.  Data conflict resolution using trust mappings , 2010, SIGMOD Conference.

[10]  Pedro M. Domingos,et al.  Discriminative Training of Markov Logic Networks , 2005, AAAI.

[11]  Michael R. Genesereth,et al.  Logical foundations of artificial intelligence , 1987 .

[12]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[13]  Felix Naumann,et al.  Conflict Handling Strategies in an Integrated Information System , 2006 .

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[16]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[17]  Yida Wang,et al.  Incorporating site-level knowledge to extract structured data from web forums , 2009, WWW '09.

[18]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[19]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[20]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[21]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[22]  Amélie Marian,et al.  Corroborating Answers from Multiple Web Sources , 2007, WebDB.

[23]  Felix Naumann,et al.  Subsumption and complementation as data fusion operators , 2010, EDBT '10.

[24]  Divesh Srivastava,et al.  Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence , 2009, CIDR.

[25]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.