Data Fusion – Resolving Data Conflicts for Integration

The amount of information produced in the world increases by 30% every year and this rate will only go up. With advanced network technology, more and more sources are available either over the Internet or in enterprise intranets. Modern data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, often require integrating available data sources and providing a uniform interface for users to access data from different sources; such requirements have been driving fruitful research on data integration over the last two decades [11, 13].

[1]  Renée J. Miller,et al.  ConQuer: efficient management of inconsistent databases , 2005, SIGMOD '05.

[2]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1998, SODA '98.

[3]  Simon French,et al.  Updating of Belief in the Light of Someone Else's Opinion , 1980 .

[4]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[7]  Alexandros Labrinidis,et al.  Exploring the tradeoff between performance and data freshness in database-driven Web servers , 2004, The VLDB Journal.

[8]  Felix Naumann,et al.  Data Fusion in Three Steps : Resolving Inconsistencies at Schema-, Tuple-, and Value-lvel , 2006 .

[9]  Dimitri Theodoratos,et al.  Data Currency Quality Satisfaction in the Design of a Data Warehouse , 2001, Int. J. Cooperative Inf. Syst..

[10]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[11]  Amélie Marian,et al.  Corroborating Answers from Multiple Web Sources , 2007, WebDB.

[12]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[13]  Anand Rajaraman,et al.  Integrating Information by Outerjoins and Full Disjunctions , 1996, PODS 1996.

[14]  Hamid Pirahesh,et al.  Canonical abstraction for outerjoin optimization , 2004, SIGMOD '04.

[15]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[16]  Jennifer Widom,et al.  Efficient Monitoring and Querying of Distributed, Dynamic Data via Approximate Replication , 2005, IEEE Data Eng. Bull..

[17]  César A. Galindo-Legaria,et al.  Outerjoins as disjunctions , 1994, SIGMOD '94.

[18]  Shashi Shekhar,et al.  Resolving attribute incompatibility in database integration: an evidential reasoning approach , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[19]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[20]  LINDA G. DEMICHIEL,et al.  Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains , 1989, IEEE Trans. Knowl. Data Eng..

[21]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD '00.

[22]  Ling Liu,et al.  TrustMe: anonymous management of trust relationships in decentralized P2P systems , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).

[23]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Laura M. Haas,et al.  Beauty and the Beast: The Theory and Practice of Information Integration , 2007, ICDT.

[25]  Sergio Greco,et al.  Integrating and Managing Conflicting Data , 2001, Ershov Memorial Conference.

[26]  Yehoshua Sagiv,et al.  An incremental algorithm for computing ranked full disjunctions , 2005, PODS '05.

[27]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[28]  Philip A. Bernstein,et al.  Model management 2.0: manipulating richer mappings , 2007, SIGMOD '07.

[29]  M. Tamer Özsu,et al.  Conflict tolerant queries in AURORA , 1999, Proceedings Fourth IFCIS International Conference on Cooperative Information Systems. CoopIS 99 (Cat. No.PR00384).

[30]  Dan Suciu,et al.  Probabilistic databases , 2011, SIGA.

[31]  Felix Naumann,et al.  Conflict Handling Strategies in an Integrated Information System , 2006 .

[32]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[33]  Dennis V. Lindley,et al.  Reconciliation of Probability Distributions , 1983, Oper. Res..

[34]  Allan Borodin,et al.  Link analysis ranking: algorithms, theory, and experiments , 2005, TOIT.

[35]  Raghu Ramakrishnan,et al.  Caching with 'Good Enough' Currency, Consistency, and Completeness , 2005, VLDB.

[36]  Felix Naumann,et al.  Automatic Data Fusion with HumMer , 2005, VLDB.

[37]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[38]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[39]  Divesh Srivastava,et al.  Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence , 2009, CIDR.

[40]  Felix Naumann,et al.  Declarative Data Fusion - Syntax, Semantics, and Implementation , 2005, ADBIS.

[41]  Felix Naumann,et al.  FuSem - Exploring Different Semantics of Data Fusion , 2007, VLDB.

[42]  Felix Naumann,et al.  Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies , 2006, IEEE Data Eng. Bull..