On scalability and robustness limitations of real and asymptotic confidence bounds in social sensing

This paper estimates new confidence bounds on source reliability in social sensing applications. Scalable and robust estimation of source reliability is a key challenge in social sensing where humans or human-operated sensors act as data sources. In order to assess correctness of data, the reliability of sources must first be assessed, yet this is complicated when sources are not a priori known and vetted, but rather can opt in at will, for example, by downloading a sensing application on their mobile device. In our previous work, we developed a maximum likelihood source reliability estimator and approximately quantified confidence in its estimation based on an asymptotic Cramer-Rao lower bound (CRLB). In this paper we show that the asymptotic bound fails to track estimation performance when the number of sources is small. We derive the real CRLB to accurately characterize estimation performance for scenarios where the asymptotic bound fails. We study the limitations of the real and asymptotic CRLBs and show the trade-offs they offer between computational complexity and estimation scalability. We also evaluate the robustness of these bounds to changes in the number of sources. The results offer an understanding of attainable estimation accuracy of source reliability in social sensing applications that rely on un-vetted sources whose reliability is not known in advance.

[1]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[2]  Divesh Srivastava,et al.  Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence , 2009, CIDR.

[3]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[4]  Lance Kaplan,et al.  On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, 2012 ACM/IEEE 11th International Conference on Information Processing in Sensor Networks (IPSN).

[5]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[6]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[7]  Ben Y. Zhao,et al.  An Empirical Study of Collusion Behavior in the Maze P2P File-Sharing System , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[8]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[9]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.

[10]  Charu C. Aggarwal,et al.  On Quantifying the Accuracy of Maximum Likelihood Estimation of Participant Reliability in Social Sensing , 2011 .

[11]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[12]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[13]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[14]  Charu C. Aggarwal,et al.  On Bayesian interpretation of fact-finding in information networks , 2011, 14th International Conference on Information Fusion.

[15]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[16]  Charu C. Aggarwal,et al.  Integrating Sensors and Social Networks , 2011, Social Network Data Analytics.