On Quantifying the Accuracy of Maximum Likelihood Estimation of Participant Reliability in Social Sensing

This paper presents a condence interval quantication of maximum likelihood estimation of participant reliability in social sensing applications. The work is motivated by the emergence of social sensing as a data collection paradigm, where humans perform the data collection tasks. A key challenge in social sensing applications lies in the uncertain nature of human measurements. Unlike well-calibrated and well-tested infrastructure sensors, humans are less reliable, and the likelihood that participants’ measurements are correct is often unknown a priori. Hence, it is hard to estimate the accuracy of conclusions made based on social sensing data. In previous work, we developed a maximum likelihood estimator of reliability of both participants and facts concluded from the data. This paper presents an analytically-founded bound that quanties the accuracy of such maximum likelihood estimation in social sensing. A condence interval is derived by leveraging the asymptotic normality of maximum likelihood estimation and computing the approximation of Cramer-Rao bound (CRB) for the estimation parameters. The proposed quantication approach is empirically validated and shown to accurately bound the actual estimation error given sucient number of participants under dierent sensing topologies.

[1]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[3]  Divesh Srivastava,et al.  Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence , 2009, CIDR.

[4]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[5]  Charu C. Aggarwal,et al.  Integrating Sensors and Social Networks , 2011, Social Network Data Analytics.

[6]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[7]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[8]  Lance Kaplan,et al.  On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, 2012 ACM/IEEE 11th International Conference on Information Processing in Sensor Networks (IPSN).

[9]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Charu C. Aggarwal,et al.  On Bayesian interpretation of fact-finding in information networks , 2011, 14th International Conference on Information Fusion.

[12]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..