On truth discovery in social sensing: A maximum likelihood estimation approach

This paper addresses the challenge of truth discovery from noisy social sensing data. The work is motivated by the emergence of social sensing as a data collection paradigm of growing interest, where humans perform sensory data collection tasks. A challenge in social sensing applications lies in the noisy nature of data. Unlike the case with well-calibrated and well-tested infrastructure sensors, humans are less reliable, and the likelihood that participants' measurements are correct is often unknown a priori. Given a set of human participants of unknown reliability together with their sensory measurements, this paper poses the question of whether one can use this information alone to determine, in an analytically founded manner, the probability that a given measurement is true. The paper focuses on binary measurements. While some previous work approached the answer in a heuristic manner, we offer the first optimal solution to the above truth discovery problem. Optimality, in the sense of maximum likelihood estimation, is attained by solving an expectation maximization problem that returns the best guess regarding the correctness of each measurement. The approach is shown to outperform the state of the art fact-finding heuristics, as well as simple baselines such as majority voting.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[3]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[5]  Shivakant Mishra,et al.  CenWits: a sensor-based loosely coupled search and rescue system using witnesses , 2005, SenSys '05.

[6]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[8]  M. Hansen,et al.  Participatory Sensing , 2019, Internet of Things.

[9]  Yang Zhang,et al.  CarTel: a distributed mobile sensor computing system , 2006, SenSys '06.

[10]  Marco Janssen,et al.  Diffusion dynamics in small-world networks with heterogeneous consumers , 2007, Comput. Math. Organ. Theory.

[11]  Leonidas J. Guibas,et al.  Mobiscopes for Human Spaces , 2007, IEEE Pervasive Computing.

[12]  Emiliano Miluzzo,et al.  The BikeNet mobile sensing system for cyclist experience mapping , 2007, SenSys '07.

[13]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[15]  Ohad Shamir,et al.  Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[16]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[17]  Divesh Srivastava,et al.  Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence , 2009, CIDR.

[18]  Deborah Estrin,et al.  PEIR, the personal environmental impact report, as a platform for participatory sensing systems research , 2009, MobiSys '09.

[19]  Gregory Dudek,et al.  Context Dependent Movie Recommendations Using a Hierarchical Bayesian Model , 2009, Canadian Conference on AI.

[20]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[21]  Mehrdad Jalali,et al.  Expectation maximization clustering algorithm for user modeling in web usage mining system , 2009 .

[22]  Deborah Estrin,et al.  Biketastic: sensing and mapping for better biking , 2010, CHI.

[23]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[24]  Deborah Estrin,et al.  Recruitment Framework for Participatory Sensing Data Collections , 2010, Pervasive.

[25]  Suman Nath,et al.  Privacy-aware regression modeling of participatory sensing data , 2010, SenSys '10.

[26]  Subbarao Kambhampati,et al.  SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement , 2010, WWW '10.

[27]  Tarek F. Abdelzaher,et al.  GreenGPS: a participatory sensing fuel-efficient maps application , 2010, MobiSys '10.

[28]  Malik Magdon-Ismail,et al.  Simulating the Diffusion of Information: An Agent-Based Modeling Approach , 2010, Int. J. Agent Technol. Syst..

[29]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[30]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[31]  Charu C. Aggarwal,et al.  On Quantifying the Accuracy of Maximum Likelihood Estimation of Participant Reliability in Social Sensing , 2011 .

[32]  Boleslaw K. Szymanski,et al.  Social consensus through the influence of committed minorities , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[34]  Jiawei Han,et al.  The Sparse Regression Cube: A Reliable Modeling Technique for Open Cyber-Physical Systems , 2011, 2011 IEEE/ACM Second International Conference on Cyber-Physical Systems.

[35]  Charu C. Aggarwal,et al.  On Bayesian interpretation of fact-finding in information networks , 2011, 14th International Conference on Information Fusion.

[36]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .