A Constrained Maximum Likelihood Estimator for Unguided Social Sensing

This paper develops a constrained expectation maximization algorithm (CEM) that improves the accuracy of truth estimation in unguided social sensing applications. Unguided social sensing refers to the act of leveraging naturally occurring observations on social media as “sensor measurements”, when the sources post at will and not in response to specific sensing campaigns or surveys. A key challenge in social sensing, in general, lies in estimating the veracity of reported observations, when the sources reporting these observations are of unknown reliability and their observations themselves cannot be readily verified. This problem is known as fact-finding. Unsupervised solutions have been proposed to the fact-finding problem that explore notions of internal data consistency in order to estimate observation veracity. This paper observes that unguided social sensing gives rise to a new (and very simple) constraint that dramatically reduces the space of feasible fact-finding solutions, hence significantly improving the quality of fact-finding results. The constraint relies on a simple approximate test of source independence, applicable to unguided sensing, and incorporates information about the number of independent sources of an observation to constrain the posterior estimate of its probability of correctness. Two different approaches are developed to test the independence of sources for purposes of applying this constraint, leading to two flavors of the CEM algorithm, we call CEM and CEM-Jaccard. We show using both simulation and real data sets collected from Twitter that by forcing the algorithm to converge to a solution in which the constraint is satisfied, the quality of solutions is significantly improved.

[1]  Charu C. Aggarwal,et al.  Using humans as sensors: An estimation-theoretic perspective , 2014, IPSN-14 Proceedings of the 13th International Symposium on Information Processing in Sensor Networks.

[2]  Shaowen Wang,et al.  Regions, Periods, Activities: Uncovering Urban Dynamics via Cross-Modal Representation Learning , 2017, WWW.

[3]  Bo Zhao,et al.  On the Discovery of Evolving Truth , 2015, KDD.

[4]  Klara Nahrstedt,et al.  Quality of Information Aware Incentive Mechanisms for Mobile Crowd Sensing Systems , 2015, MobiHoc.

[5]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[6]  Tarek F. Abdelzaher,et al.  On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, International Symposium on Information Processing in Sensor Networks.

[7]  Ben Taskar,et al.  Expectation Maximization and Posterior Constraints , 2007, NIPS.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Lu Su,et al.  A Truth Discovery Approach with Theoretical Guarantee , 2016, KDD.

[10]  Charu C. Aggarwal,et al.  Recursive Ground Truth Estimator for Social Data Streams , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[11]  Shaohan Hu,et al.  On Source Dependency Models for Reliable Social Sensing: Algorithms and Fundamental Error Bounds , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[12]  Shen Li,et al.  Optimizing Source Selection in Social Sensing in the Presence of Influence Graphs , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[13]  Shaowen Wang,et al.  GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams , 2016, SIGIR.

[14]  Shaohan Hu,et al.  DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing , 2016, WWW.

[15]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[16]  Mani B. Srivastava,et al.  Truth Discovery in Crowdsourced Detection of Spatial Events , 2016, IEEE Trans. Knowl. Data Eng..

[17]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[18]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[19]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[20]  Klara Nahrstedt,et al.  Theseus: Incentivizing Truth Discovery in Mobile Crowd Sensing Systems , 2017, MobiHoc.