A survey on data fusion: what for? in what form? what is next?

Data fusion is the process of merging records from multiple sources which represent the same real-world object into a single representation. This review of the literature concerns Data Fusion in the context of data integration, i.e., the integration of structured and semi-structured data from the same domain, and provides an overview of this field of research. We present why data fusion is becoming increasingly necessary, what it is used for (What for?), what methods and solutions for data fusion have been proposed in the literature (In what form?), what research challenges are still open in the data fusion area and what future research directions could usefully take (What is next?)

[1]  Wei Hu,et al.  Exploiting Source-Object Networks to Resolve Object Conflicts in Linked Data , 2017, ESWC.

[2]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[3]  Fenglong Ma,et al.  Discovering Truths from Distributed Data , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[4]  Divesh Srivastava,et al.  Data Fusion: Resolving Conflicts from Multiple Sources , 2013, WAIM.

[5]  Dan Roth,et al.  Latent credibility analysis , 2013, WWW.

[6]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2008, IEEE Trans. Knowl. Data Eng..

[7]  Bo Zhao,et al.  Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery , 2016, IEEE Transactions on Knowledge and Data Engineering.

[8]  Prem Prakash Jayaraman,et al.  OpenIoT: Open Source Internet-of-Things in the Cloud , 2014, OpenIoT@SoftCOM.

[9]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[10]  John Dunnion,et al.  ProbFuse: a probabilistic approach to data fusion , 2006, SIGIR.

[11]  Lei Chen,et al.  Domain-Aware Multi-Truth Discovery from Conflicting Sources , 2018, Proc. VLDB Endow..

[12]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[13]  Alaa H. Ahmed,et al.  Datafusion: taking source confidences into account , 2018, ICIST '18.

[14]  Christopher Ré,et al.  SLiMFast: Guaranteed Results for Data Fusion and Source Reliability , 2015, SIGMOD Conference.

[15]  Fenglong Ma,et al.  Influence-Aware Truth Discovery , 2016, CIKM.

[16]  Divesh Srivastava,et al.  Data quality: The other face of Big Data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[17]  Smruti R. Sarangi,et al.  Internet of Things: Architectures, Protocols, and Applications , 2017, J. Electr. Comput. Eng..

[18]  Jens Bleiholder,et al.  Data fusion and conflict resolution in integrated information systems , 2010 .

[19]  Taylor Cassidy,et al.  The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding , 2014, COLING.

[20]  Murat Demirbas,et al.  The impact of data aggregation on the performance of wireless sensor networks , 2008, Wirel. Commun. Mob. Comput..

[21]  James Llinas,et al.  An introduction to multisensor data fusion , 1997, Proc. IEEE.

[22]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[23]  Gjergji Kasneci,et al.  Restricted Boltzmann Machines for Robust and Fast Latent Truth Discovery , 2018, ArXiv.

[24]  Felix Naumann,et al.  Automatic Data Fusion with HumMer , 2005, VLDB.

[25]  Prem Prakash Jayaraman,et al.  City Data Fusion: Sensor Data Fusion in the Internet of Things , 2015, Int. J. Distributed Syst. Technol..

[26]  Belur V. Dasarathy,et al.  Medical Image Fusion: A survey of the state of the art , 2013, Inf. Fusion.

[27]  Lei Zhang,et al.  A Effective Truth Discovery Algorithm with Multi-source Sparse Data , 2018, ICCS.

[28]  Jian Zhang,et al.  TruthDiscover: Resolving Object Conflicts on Massive Linked Data , 2017, WWW.

[29]  Weihua Xu,et al.  A novel approach to information fusion in multi-source datasets: A granular computing viewpoint , 2017, Inf. Sci..

[30]  Dan Roth,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Making Better Informed Trust Decisions with Generalized Fact-Finding , 2022 .

[31]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[32]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[33]  Lina Yao,et al.  An Integrated Bayesian Approach for Effective Multi-Truth Discovery , 2015, CIKM.

[34]  Hao Wu,et al.  Item recommendation in collaborative tagging systems via heuristic data fusion , 2015, Knowl. Based Syst..

[35]  Ali Ahmadi,et al.  Toward high level data fusion for conflict resolution , 2017, 2017 International Conference on Machine Learning and Cybernetics (ICMLC).

[36]  Renée J. Miller,et al.  ConQuer: efficient management of inconsistent databases , 2005, SIGMOD '05.

[37]  Martin Necaský,et al.  Linked Data Integration with Conflicts , 2014, ArXiv.

[38]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[39]  Yide Ma,et al.  Medical image fusion using m-PCNN , 2008, Inf. Fusion.

[40]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[41]  Divesh Srivastava,et al.  Scaling up copy detection , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[42]  Xiu Susie Fang Truth Discovery from Conflicting Multi-Valued Objects , 2017, WWW.

[43]  Fenglong Ma,et al.  Towards Confidence in the Truth: A Bootstrapping based Truth Discovery Approach , 2016, KDD.

[44]  Wei Zhang,et al.  From Data Fusion to Knowledge Fusion , 2014, Proc. VLDB Endow..

[45]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[46]  Laércio Massaru Namikawa,et al.  Image Fusion for Remote Sensing Applications , 2011 .

[47]  Jindrich Mynarz,et al.  New Directions in Linked Data Fusion , 2014, International Semantic Web Conference.

[48]  Cunpeng Wang Data Analysis in Incomplete Information Systems Based on Granular Computing , 2010, 2010 International Conference on System Science, Engineering Design and Manufacturing Informatization.

[49]  Yan Zheng,et al.  Truth discovery on multi-dimensional properties of data sources , 2019, ACM TUR-C.

[50]  Vicenç Torra,et al.  Modeling decisions - information fusion and aggregation operators , 2007 .

[51]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[52]  Tonghai Jiang,et al.  A Novel Data Integration Framework Based on Unified Concept Model , 2017, IEEE Access.

[53]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[54]  Luci Pirmez,et al.  Athena: A Knowledge Fusion Algorithm for the Internet of Things , 2018, Q2SWinet'18.

[55]  Wei Hu,et al.  A new truth discovery method for resolving object conflicts over Linked Data with scale-free property , 2018, Knowledge and Information Systems.

[56]  Cristina Dutra de Aguiar Ciferri,et al.  Incremental Data Fusion Based on Provenance Information , 2013, In Search of Elegance in the Theory and Practice of Computation.

[57]  Trevor J. M. Bench-Capon,et al.  Kraft: An Agent Architecture for Knowledge Fusion , 2001, Int. J. Cooperative Inf. Syst..

[58]  Yuren Zhou,et al.  A survey of data fusion in smart city applications , 2019, Inf. Fusion.

[59]  Laurence T. Yang,et al.  A survey on data fusion in internet of things: Towards secure and privacy-preserving fusion , 2019, Inf. Fusion.

[60]  Felix Naumann,et al.  Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[61]  Amihai Motro,et al.  Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources , 2006, Inf. Fusion.

[62]  Gjergji Kasneci,et al.  LTD-RBM: Robust and Fast Latent Truth Discovery Using Restricted Boltzmann Machines , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[63]  Laure Berti-Équille,et al.  Truth Discovery Algorithms: An Experimental Evaluation , 2014, ArXiv.

[64]  Anne H. H. Ngu,et al.  SmartMTD: A Graph-Based Approach for Effective Multi-Truth Discovery , 2017, ArXiv.

[65]  Divesh Srivastava,et al.  Knowledge Curation and Knowledge Fusion: Challenges, Models and Applications , 2015, SIGMOD Conference.

[66]  Quan Z. Sheng,et al.  An Ensemble Approach for Better Truth Discovery , 2016, ADMA.

[67]  Qing Liu,et al.  A probabilistic model for truth discovery with object correlations , 2019, Knowl. Based Syst..

[68]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[69]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[70]  Gjergji Kasneci,et al.  Combining Restricted Boltzmann Machines with Neural Networks for Latent Truth Discovery , 2018, ArXiv.

[71]  Zhifeng Bao,et al.  Sifting Truths from Multiple Low-Quality Data Sources , 2017, APWeb/WAIM.

[72]  Yang Li,et al.  Discovering Multiple Truths with a Hybrid Model , 2017, ArXiv.

[73]  Murat Demirbas,et al.  The impact of data aggregation on the performance of wireless sensor networks , 2008 .

[74]  Dinesh Singh,et al.  Data Fusion and Data Aggregation/Summarization Techniques in WSNs: A Review , 2015 .

[75]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[76]  Laure Berti-Équille,et al.  Data veracity estimation with ensembling truth discovery methods , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[77]  Lina Yao,et al.  SourceVote: Fusing Multi-valued Data via Inter-source Agreements , 2017, ER.