Understanding and predicting Web content credibility using the Content Credibility Corpus

Abstract The goal of our research is to create a predictive model of Web content credibility evaluations, based on human evaluations. The model has to be based on a comprehensive set of independent factors that can be used to guide user’s credibility evaluations in crowdsourced systems like WOT, but also to design machine classifiers of Web content credibility. The factors described in this article are based on empirical data. We have created a dataset obtained from an extensive crowdsourced Web credibility assessment study (over 15 thousand evaluations of over 5000 Web pages from over 2000 participants). First, online participants evaluated a multi-domain corpus of selected Web pages. Using the acquired data and text mining techniques we have prepared a code book and conducted another crowdsourcing round to label textual justifications of the former responses. We have extended the list of significant credibility assessment factors described in previous research and analyzed their relationships to credibility evaluation scores. Discovered factors that affect Web content credibility evaluations are also weakly correlated, which makes them more useful for modeling and predicting credibility evaluations. Based on the newly identified factors, we propose a predictive model for Web content credibility. The model can be used to determine the significance and impact of discovered factors on credibility evaluations. These findings can guide future research on the design of automatic or semi-automatic systems for Web content credibility evaluation support. This study also contributes the largest credibility dataset currently publicly available for research: the Content Credibility Corpus (C3).

[1]  Xiuzhen Zhang,et al.  User Perception of Information Credibility of News on Twitter , 2014, ECIR.

[2]  Wei Zhang,et al.  Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources , 2015, Proc. VLDB Endow..

[3]  Hyunjung Kim,et al.  An anatomy of the credibility of online newspapers , 2010, Online Inf. Rev..

[4]  Ponnurangam Kumaraguru,et al.  Credibility ranking of tweets during high impact events , 2012, PSOSM '12.

[5]  ChengXiang Zhai,et al.  Reliability Prediction of Webpages in the Medical Domain , 2012, ECIR.

[6]  Katsumi Tanaka,et al.  Enhancing credibility judgment of web search results , 2011, CHI.

[7]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[8]  B. J. Fogg,et al.  What makes Web sites credible?: a report on a large quantitative study , 2001, CHI.

[9]  D. Kahneman Thinking, Fast and Slow , 2011 .

[10]  Joseph A. Konstan,et al.  Evolution of Experts in Question Answering Communities , 2012, ICWSM.

[11]  Lei Li,et al.  Answer Quality Characteristics and Prediction on an Academic Q&A Site: A Case Study on ResearchGate , 2015, WWW.

[12]  Luís Carriço,et al.  On the credibility of wikipedia: an accessibility perspective , 2008, WICOW '08.

[13]  B. J. Fogg,et al.  The elements of computer credibility , 1999, CHI '99.

[14]  Karl Aberer,et al.  Web Credibility: Features Exploration and Credibility Prediction , 2013, ECIR.

[15]  Adam Wierzbicki,et al.  On the subjectivity and bias of web content credibility evaluations , 2013, WWW.

[16]  Marios Poulos,et al.  Evaluating authoritative sources using social networks: an insight from Wikipedia , 2006, Online Inf. Rev..

[17]  Adam Wierzbicki,et al.  Predicting Controversy of Wikipedia Articles Using the Article Feedback Tool , 2014, SocialCom '14.

[18]  Adrian Popescu,et al.  Credibility in Information Retrieval , 2015, Found. Trends Inf. Retr..

[19]  S. Sundar The MAIN Model : A Heuristic Approach to Understanding Technology Effects on Credibility , 2007 .

[20]  Adam Wierzbicki,et al.  Predicting webpage credibility using linguistic features , 2014, WWW '14 Companion.

[21]  Sibel Adali,et al.  Credibility in Context: An Analysis of Feature Distributions in Twitter , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[22]  Xiuzhen Zhang,et al.  Tweet Author Location Impacts on Tweet Credibility , 2014, ADCS '14.

[23]  Brandon Van Der Heide,et al.  Social Media as Information Source: Recency of Updates and Credibility of Information , 2014, J. Comput. Mediat. Commun..

[24]  David W. Wilson,et al.  A Picture is Worth a Thousand Words: Source Credibility Theory Applied to Logo and Website Design for Heightened Credibility and Consumer Trust , 2014, Int. J. Hum. Comput. Interact..

[25]  Barbara Poblete,et al.  Predicting information credibility in time-sensitive social media , 2013, Internet Res..

[26]  Adam Wierzbicki,et al.  Incredible: is (almost) all web content trustworthy? analysis of psychological factors related to website credibility evaluation , 2014, WWW.

[27]  Darja Groselj,et al.  A webometric analysis of online health information: sponsorship, platform type and link structures , 2014, Online Inf. Rev..

[28]  C. I. Hovland,et al.  The Influence of Source Credibility on Communication Effectiveness , 1951 .

[29]  Wei Zha,et al.  The Impact of Online Disruptive Ads on Users’ Comprehension, Evaluation of Site Credibility, and Sentiment of Intrusiveness , 2014 .

[30]  Ben Shneiderman,et al.  Building Trusted Social Media Communities: A Research Roadmap for Promoting Credible Content , 2015 .

[31]  Oliver Ferschke,et al.  What makes a good biography?: multidimensional quality analysis based on wikipedia article feedback data , 2014, WWW.

[32]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[33]  D. R. Danielson,et al.  How do users evaluate the credibility of Web sites?: a study with over 2,500 participants , 2003, DUX '03.

[34]  Adrian Popescu,et al.  User profiling for answer quality assessment in Q&A communities , 2013, DUBMOD '13.

[35]  Alain Yee-Loong Chong,et al.  Examining the antecedents of persuasive eWOM messages in social media , 2014, Online Inf. Rev..

[36]  Sibel Adali,et al.  Understanding Information Credibility on Twitter , 2013, 2013 International Conference on Social Computing.

[37]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.