A Survey on Truth Discovery

Thanks to information explosion, data for the objects of interest can be collected from increasingly more sources. However, for the same object, there usually exist conflicts among the collected multi-source information. To tackle this challenge, truth discovery, which integrates multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this survey, we focus on providing a comprehensive overview of truth discovery methods, and summarizing them from different aspects. We also discuss some future directions of truth discovery research. We hope that this survey will promote a better understanding of the current progress on truth discovery, and offer some guidelines on how to apply these approaches in application domains.

[1]  穂鷹 良介 Non-Linear Programming の計算法について , 1963 .

[2]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[3]  S.A.M. Said,et al.  Effects of dust on the performance of thermal and photovoltaic flat plate collectors in Saudi Arabia: preliminary results , 1981 .

[4]  M. SayighAA,et al.  Dust effect on solar flat surfaces devices in Kuwait. , 1985 .

[5]  N. Laird,et al.  Meta-analysis in clinical trials. , 1986, Controlled clinical trials.

[6]  J. P. Gupta,et al.  Effect of dust on transmittance of glazing materials for solar collectors under arid zone conditions of India , 1990 .

[7]  M. S. El-Shobokshy,et al.  Effect of dust with different physical properties on the performance of photovoltaic cells , 1993 .

[8]  M. S. El-Shobokshy,et al.  Degradation of photovoltaic cell performance due to dust deposition on to its surface , 1993 .

[9]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[10]  B. Gwandu,et al.  Humidity: A factor in the appropriate positioning of a photovoltaic power station , 1995 .

[11]  W. Beckman,et al.  A method for estimating the long-term performance of direct-coupled PV pumping systems , 1998 .

[12]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[13]  Berner Fachhochschule,et al.  Gradual Reduction of PV Generator Yield due to Pollution , 1998 .

[14]  G. A. Mastekbayeva,et al.  Effect of dust on the transmittance of low density polyethylene glazing in a tropical climate , 2000 .

[15]  A. Hegazy Effect of dust accumulation on solar transmittance through glass covers of plate-type collectors , 2001 .

[16]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[17]  Ren C. Luo,et al.  Multisensor fusion and integration: approaches, applications, and future research directions , 2002 .

[18]  T. Nordmann,et al.  Understanding temperature effects on PV system performance , 2003, 3rd World Conference onPhotovoltaic Energy Conversion, 2003. Proceedings of.

[19]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[20]  C. Gueymard The sun's total and spectral irradiance for solar energy applications and solar radiation models , 2004 .

[21]  Danny H.W. Li,et al.  Analysis of the operational performance and efficiency characteristic for photovoltaic system in Hong Kong , 2005 .

[22]  Barry Smyth,et al.  Trust in recommender systems , 2005, IUI.

[23]  M. Kempe Modeling of rates of moisture ingress into photovoltaic modules , 2006 .

[24]  M. M. Beheary,et al.  Effect of dust on the transparent cover of solar collectors , 2006 .

[25]  Felix Naumann,et al.  Conflict Handling Strategies in an Integrated Information System , 2006 .

[26]  H. B. Mitchell,et al.  Multi-Sensor Data Fusion: An Introduction , 2007 .

[27]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[28]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[29]  Pietro Perona,et al.  Some Objects Are More Equal Than Others: Measuring and Predicting Importance , 2008, ECCV.

[30]  V. Badescu Modeling Solar Radiation at the Earth’s Surface , 2008 .

[31]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[32]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[33]  E. Skoplaki,et al.  ON THE TEMPERATURE DEPENDENCE OF PHOTOVOLTAIC MODULE ELECTRICAL PERFORMANCE: A REVIEW OF EFFICIENCY/ POWER CORRELATIONS , 2009 .

[34]  Felix Naumann,et al.  Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[35]  N. Amin,et al.  A practical field study of various solar cells on their performance in Malaysia , 2009 .

[36]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[37]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[38]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[39]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[40]  Ee-Peng Lim,et al.  To Trust or Not to Trust? Predicting Online Trusts Using Trust Antecedent Framework , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[41]  Elena Console,et al.  Data Fusion , 2009, Encyclopedia of Database Systems.

[42]  Giovanni Seni,et al.  Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions , 2010, Ensemble Methods in Data Mining.

[43]  Rohit Pillai,et al.  Impact of dust on solar photovoltaic (PV) performance: Research status, challenges and recommendations , 2010 .

[44]  J. K. Kaldellis,et al.  Quantifying the decrease of the photovoltaic panels energy yield due to phenomena of natural air po , 2010 .

[45]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[46]  Shili Lin,et al.  Rank aggregation methods , 2010 .

[47]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[48]  Cher Ming Tan,et al.  Humidity study of a-Si PV cell , 2010, Microelectron. Reliab..

[49]  D. Roth,et al.  COMPREHENSIVE TRUST METRICS FOR INFORMATION NETWORKS , 2010 .

[50]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[51]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[52]  Chris Cornelis,et al.  Trust and Recommendations , 2011, Recommender Systems Handbook.

[53]  Amélie Marian,et al.  A framework for corroborating answers from multiple web sources , 2011, Inf. Syst..

[54]  Hai Jiang,et al.  Experimental investigation of the impact of airborne dust deposition on the performance of solar photovoltaic (PV) modules , 2011 .

[55]  Beng Chin Ooi,et al.  Online data fusion , 2011, Proc. VLDB Endow..

[56]  Yizhou Sun,et al.  Trust analysis with clustering , 2011, WWW.

[57]  L. Chaar,et al.  Review of photovoltaic technologies , 2011 .

[58]  Alon Y. Halevy,et al.  Data integration with dependent sources , 2011, EDBT/ICDT '11.

[59]  Amélie Marian,et al.  Corroborating Information from Web Sources , 2011, IEEE Data Eng. Bull..

[60]  Dan Roth,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Making Better Informed Trust Decisions with Generalized Fact-Finding , 2022 .

[61]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[62]  J. Carretero,et al.  Analysis of Dust Losses in Photovoltaic Modules , 2011 .

[63]  Md. Yusuf Sarwar Uddin,et al.  Demo: Distilling likely truth from noisy streaming data with Apollo , 2011, SenSys.

[64]  Ranko Goic,et al.  review of solar photovoltaic technologies , 2011 .

[65]  John K. Kaldellis,et al.  Systematic experimental study of the pollution deposition impact on the energy yield of photovoltaic installations , 2011 .

[66]  Werner Kießling,et al.  Corroborating Information from Web Sources. , 2011 .

[67]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[68]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[69]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[70]  Lance Kaplan,et al.  On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, 2012 ACM/IEEE 11th International Conference on Information Processing in Sensor Networks (IPSN).

[71]  Divesh Srivastava,et al.  Less is More: Selecting Sources Wisely for Integration , 2012, Proc. VLDB Endow..

[72]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[73]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[74]  Charu C. Aggarwal,et al.  Recursive Fact-Finding: A Streaming Approach to Truth Estimation in Crowdsourcing Applications , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[75]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[76]  Divesh Srivastava,et al.  Compact explanation of data fusion decisions , 2013, WWW.

[77]  Felipe A. Mejia,et al.  Soiling losses for solar photovoltaic systems in California , 2013 .

[78]  Thomas F. La Porta,et al.  Trustworthiness analysis of sensor data in cyber-physical systems , 2013, J. Comput. Syst. Sci..

[79]  Charu C. Aggarwal,et al.  Social Sensing , 2013, Managing and Mining Sensor Data.

[80]  Dan Roth,et al.  Latent credibility analysis , 2013, WWW.

[81]  Charu C. Aggarwal,et al.  Mining collective intelligence in diverse groups , 2013, WWW.

[82]  Shaharin Anwar Sulaiman,et al.  Influence of Dirt Accumulation on Performance of PV Panels , 2014 .

[83]  Huan Liu,et al.  Trust in social computing , 2014, WWW.

[84]  Taylor Cassidy,et al.  The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding , 2014, COLING.

[85]  Zuzana Pelikánová,et al.  Google Knowledge Graph , 2014 .

[86]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[87]  Mong-Li Lee,et al.  Entity profiling with varying source reliabilities , 2014, KDD.

[88]  Gerhard Weikum,et al.  People on drugs: credibility of user statements in health communities , 2014, KDD.

[89]  Charu C. Aggarwal,et al.  Using humans as sensors: An estimation-theoretic perspective , 2014, IPSN-14 Proceedings of the 13th International Symposium on Information Processing in Sensor Networks.

[90]  Wei Zhang,et al.  From Data Fusion to Knowledge Fusion , 2014, Proc. VLDB Endow..

[91]  Divesh Srivastava,et al.  Characterizing and selecting fresh data sources , 2014, SIGMOD Conference.

[92]  A. Sayyah,et al.  Energy yield loss caused by dust deposition on photovoltaic panels , 2014 .

[93]  Xue Liu,et al.  Generalized Decision Aggregation in Distributed Sensing Systems , 2014, 2014 IEEE Real-Time Systems Symposium.

[94]  Raul Poler,et al.  Non-Linear Programming , 2014 .

[95]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[96]  Bo Zhao,et al.  The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing , 2014, WWW.

[97]  Wilfred Ng,et al.  Truth Discovery in Data Streams: A Single-Pass Probabilistic Approach , 2014, CIKM.

[98]  Murat Demirbas,et al.  Crowdsourcing for Multiple-Choice Question Answering , 2014, AAAI.

[99]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[100]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[101]  Tarek F. Abdelzaher,et al.  Maximum likelihood analysis of conflicting observations in social sensing , 2014, TOSN.

[102]  Shiguang Wang,et al.  Towards Cyber-Physical Systems in Social Spaces: The Data Reliability Challenge , 2014, 2014 IEEE Real-Time Systems Symposium.

[103]  Wei Zhang,et al.  Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources , 2015, Proc. VLDB Endow..

[104]  Heng Ji,et al.  Modeling Truth Existence in Truth Discovery , 2015, KDD.

[105]  Shen Li,et al.  Scalable social sensing of interdependent phenomena , 2015, IPSN.

[106]  Bo Zhao,et al.  On the Discovery of Evolving Truth , 2015, KDD.

[107]  T. Wood,et al.  Sundown, sunrise: how Australia can finally get solar power right , 2015 .

[108]  Heng Ji,et al.  FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation , 2015, KDD.

[109]  Jing Gao,et al.  Truth Discovery on Crowd Sensing of Correlated Entities , 2015, SenSys.

[110]  Chenglin Miao,et al.  Cloud-Enabled Privacy-Preserving Truth Discovery in Crowd Sensing Systems , 2015, SenSys.

[111]  H. Hottel,et al.  The Performance of Flat-Plate Solar Heat Collectors , 1942, Renewable Energy.