Maximum likelihood analysis of conflicting observations in social sensing

This article addresses the challenge of truth discovery from noisy social sensing data. The work is motivated by the emergence of social sensing as a data collection paradigm of growing interest, where humans perform sensory data collection tasks. Unlike the case with well-calibrated and well-tested infrastructure sensors, humans are less reliable, and the likelihood that participants' measurements are correct is often unknown a priori. Given a set of human participants of unknown trustworthiness together with their sensory measurements, we pose the question of whether one can use this information alone to determine, in an analytically founded manner, the probability that a given measurement is true. In our previous conference paper, we offered the first maximum likelihood solution to the aforesaid truth discovery problem for corroborating observations only. In contrast, this article extends the conference paper and provides the first maximum likelihood solution to handle the cases where measurements from different participants may be conflicting. The article focuses on binary measurements. The approach is shown to outperform our previous work used for corroborating observations, the state-of-the-art fact-finding baselines, as well as simple heuristics such as majority voting.

[1]  Julio Gonzalo,et al.  Towards real-time summarization of scheduled events from twitter streams , 2012, HT '12.

[2]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[3]  Rajesh Krishna Balan,et al.  Real-time trip information service for a large taxi fleet , 2011, MobiSys '11.

[4]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[5]  Malik Magdon-Ismail,et al.  Simulating the Diffusion of Information: An Agent-Based Modeling Approach , 2010, Int. J. Agent Technol. Syst..

[6]  David G. Stork,et al.  Pattern Classification , 1973 .

[7]  Shivakant Mishra,et al.  CenWits: a sensor-based loosely coupled search and rescue system using witnesses , 2005, SenSys '05.

[8]  Suman Nath,et al.  ACE: Exploiting Correlation for Energy-Efficient and Continuous Context Sensing , 2012, IEEE Transactions on Mobile Computing.

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  Emiliano Miluzzo,et al.  The BikeNet mobile sensing system for cyclist experience mapping , 2007, SenSys '07.

[11]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  Mehrdad Jalali,et al.  Expectation maximization clustering algorithm for user modeling in web usage mining system , 2009 .

[13]  Boleslaw K. Szymanski,et al.  Social consensus through the influence of committed minorities , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[17]  Inseok Hwang,et al.  E-Gesture: a collaborative architecture for energy-efficient gesture recognition with hand-worn sensor and mobile devices , 2011, SenSys.

[18]  Charu C. Aggarwal,et al.  Optimizing quality-of-information in cost-sensitive sensor data fusion , 2011, 2011 International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS).

[19]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[20]  Charu C. Aggarwal,et al.  On scalability and robustness limitations of real and asymptotic confidence bounds in social sensing , 2012, 2012 9th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON).

[21]  Subbarao Kambhampati,et al.  SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement , 2010, WWW '10.

[22]  Deborah Estrin,et al.  Recruitment Framework for Participatory Sensing Data Collections , 2010, Pervasive.

[23]  Margaret Martonosi,et al.  Human mobility modeling at metropolitan scales , 2012, MobiSys '12.

[24]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[25]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[26]  Suman Nath,et al.  Privacy-aware regression modeling of participatory sensing data , 2010, SenSys '10.

[27]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[28]  Claudia Biermann,et al.  Mathematical Methods Of Statistics , 2016 .

[29]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[30]  Jiawei Han,et al.  The Sparse Regression Cube: A Reliable Modeling Technique for Open Cyber-Physical Systems , 2011, 2011 IEEE/ACM Second International Conference on Cyber-Physical Systems.

[31]  Gregory Dudek,et al.  Context Dependent Movie Recommendations Using a Hierarchical Bayesian Model , 2009, Canadian Conference on AI.

[32]  Divesh Srivastava,et al.  Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence , 2009, CIDR.

[33]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[34]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[35]  Charu C. Aggarwal,et al.  Recursive Fact-Finding: A Streaming Approach to Truth Estimation in Crowdsourcing Applications , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[36]  Mo Li,et al.  How Long to Wait? Predicting Bus Arrival Time With Mobile Phone Based Participatory Sensing , 2012, IEEE Transactions on Mobile Computing.

[37]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[38]  Suman Nath,et al.  Privacy-Preserving Reconstruction of Multidimensional Data Maps in Vehicular Participatory Sensing , 2010, EWSN.

[39]  Ohad Shamir,et al.  Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[40]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[41]  Deborah Estrin,et al.  Biketastic: sensing and mapping for better biking , 2010, CHI.

[42]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[43]  Charu C. Aggarwal,et al.  On Credibility Estimation Tradeoffs in Assured Social Sensing , 2013, IEEE Journal on Selected Areas in Communications.

[44]  Charu C. Aggarwal,et al.  On Bayesian interpretation of fact-finding in information networks , 2011, 14th International Conference on Information Fusion.

[45]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[46]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[47]  WangDong,et al.  Maximum likelihood analysis of conflicting observations in social sensing , 2014 .

[48]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[49]  Yang Zhang,et al.  CarTel: a distributed mobile sensor computing system , 2006, SenSys '06.

[50]  H. Cramér Mathematical Methods of Statistics (PMS-9), Volume 9 , 1946 .

[51]  Marco Janssen,et al.  Diffusion dynamics in small-world networks with heterogeneous consumers , 2007, Comput. Math. Organ. Theory.

[52]  Leonidas J. Guibas,et al.  Mobiscopes for Human Spaces , 2007, IEEE Pervasive Computing.

[53]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[54]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2008, IEEE Trans. Knowl. Data Eng..

[55]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[56]  Lance Kaplan,et al.  On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, 2012 ACM/IEEE 11th International Conference on Information Processing in Sensor Networks (IPSN).