Theme-Relevant Truth Discovery on Twitter: An Estimation Theoretic Approach

Twitter has emerged as a new application paradigm of sensing the physical environment by using human as sensors. These human sensed observations are often viewed as binary claims (either true or false). A fundamental challenge on Twitter is how to ascertain the credibility of claims and the reliability of sources without the prior knowledge on either of them beforehand. This challenge is referred to as truth discovery. An important limitation exists in the current Twitter-based truth discovery solutions: they did not explore the theme relevance aspect of claims and the correct claims identified by their solutions can be completely irrelevant to the theme of interests. In this paper, we present a new analytical model that explicitly considers the theme relevance feature of claims in the solutions of truth discovery problem on Twitter. The new model solves a bi-dimensional estimation problem to jointly estimate the correctness and theme relevance of claims as well as the reliability and theme awareness of sources. The new model is compared with the discovery solutions in current literature using three real world datasets collected from Twitter during recent disastrous and emergent events: Paris attack, Oregon shooting, and Baltimore riots, all in 2015. The new model was shown to be effective in terms of finding both correct and relevant claims.

[1]  Charu C. Aggarwal,et al.  Using humans as sensors: An estimation-theoretic perspective , 2014, IPSN-14 Proceedings of the 13th International Symposium on Information Processing in Sensor Networks.

[2]  Charu C. Aggarwal,et al.  Social Sensing , 2013, Managing and Mining Sensor Data.

[3]  J. Wooders,et al.  Reputation in Auctions: Theory, and Evidence from Ebay , 2006 .

[4]  Taylor Cassidy,et al.  The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding , 2014, COLING.

[5]  Dan Roth,et al.  Generalized fact-finding , 2011, WWW.

[6]  Nitesh V. Chawla,et al.  Towards Time-Sensitive Truth Discovery in Social Sensing Applications , 2015, 2015 IEEE 12th International Conference on Mobile Ad Hoc and Sensor Systems.

[7]  Yuping Zhao,et al.  A Novel Fast Anti-Collision Algorithm for RFID Systems , 2007, 2007 International Conference on Wireless Communications, Networking and Mobile Computing.

[8]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[9]  Mani B. Srivastava,et al.  Debiasing crowdsourced quantitative characteristics in local businesses and services , 2015, IPSN.

[10]  Zhao Yuping A Novel Anti-Collision Protocol in Multiple Readers RFID Sensor Networks , 2008 .

[11]  Jiawei Han,et al.  Heterogeneous network-based trust analysis: a survey , 2011, SKDD.

[12]  Chao Huang,et al.  Confidence-aware truth estimation in social sensing applications , 2015, 2015 12th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[13]  Charu C. Aggarwal,et al.  Mining collective intelligence in diverse groups , 2013, WWW.

[14]  Shen Li,et al.  Scalable social sensing of interdependent phenomena , 2015, IPSN.

[15]  Georgios B. Giannakis,et al.  Sensor-Centric Data Reduction for Estimation With WSNs via Censoring and Quantization , 2012, IEEE Transactions on Signal Processing.

[16]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[17]  Dong Wang,et al.  Social Sensing: Building Reliable Systems on Unreliable Data , 2015 .

[18]  Yu Hen Hu,et al.  Maximum likelihood multiple-source localization using acoustic energy measurements with wireless sensor networks , 2005, IEEE Transactions on Signal Processing.

[19]  Charu C. Aggarwal,et al.  On Bayesian interpretation of fact-finding in information networks , 2011, 14th International Conference on Information Fusion.

[20]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[21]  Hsia-Ching Chang,et al.  A new perspective on Twitter hashtag use: Diffusion of innovation theory , 2010, ASIST.

[22]  Tarek F. Abdelzaher,et al.  On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, International Symposium on Information Processing in Sensor Networks.

[23]  Julita Vassileva,et al.  A Review on Trust and Reputation for Web Service Selection , 2007, 27th International Conference on Distributed Computing Systems Workshops (ICDCSW'07).

[24]  Charu C. Aggarwal,et al.  Recursive Fact-Finding: A Streaming Approach to Truth Estimation in Crowdsourcing Applications , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[25]  Dan Roth,et al.  Provenance-Assisted Classification in Social Networks , 2014, IEEE Journal of Selected Topics in Signal Processing.

[26]  Wen Hu,et al.  On the need for a reputation system in mobile phone based sensing , 2014, Ad Hoc Networks.

[27]  Wen Hu,et al.  Are you contributing trustworthy data?: the case for a reputation system in participatory sensing , 2010, MSWIM '10.

[28]  Chao Huang,et al.  Spatial-Temporal Aware Truth Finding in Big Data Social Sensing Applications , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[29]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[30]  Luís M. B. Cabral,et al.  The Dynamics of Seller Reputation: Evidence from Ebay , 2006 .

[31]  Anatole Gershman,et al.  Topical Clustering of Tweets , 2011 .

[32]  Wilfred Ng,et al.  Truth Discovery in Data Streams: A Single-Pass Probabilistic Approach , 2014, CIKM.

[33]  Murat Sensoy,et al.  Trust estimation and fusion of uncertain information by exploiting consistency , 2014, 17th International Conference on Information Fusion (FUSION).

[34]  Bryce Glass,et al.  Building Web Reputation Systems , 2010 .

[35]  Roberto López-Valcarce,et al.  A Diffusion-Based EM Algorithm for Distributed Estimation in Unreliable Sensor Networks , 2013, IEEE Signal Processing Letters.