Web Credibility: Features Exploration and Credibility Prediction

The open nature of the World Wide Web makes evaluating webpage credibility challenging for users. In this paper, we aim to automatically assess web credibility by investigating various characteristics of webpages. Specifically, we first identify features from textual content, link structure, webpages design, as well as their social popularity learned from popular social media sites (e.g., Facebook, Twitter). A set of statistical analyses methods are applied to select the most informative features, which are then used to infer webpages credibility by employing supervised learning algorithms. Real dataset-based experiments under two application settings show that we attain an accuracy of 75% for classification, and an improvement of 53% for the mean absolute error (MAE), with respect to the random baseline approach, for regression.

[1]  Herre van Oostendorp,et al.  An Attempt to Automate the Process of Source Evaluation , 2011 .

[2]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[3]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[4]  Meredith Ringel Morris,et al.  Augmenting web pages and search results to support credibility assessment , 2011, CHI.

[5]  M. de Rijke,et al.  Credibility Improves Topical Blog Post Retrieval , 2008, ACL.

[6]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Scott Counts,et al.  Tweeting is believing?: understanding microblog credibility perceptions , 2012, CSCW.

[8]  James Caverlee,et al.  Ranking Comments on the Social Web , 2009, 2009 International Conference on Computational Science and Engineering.

[9]  Elizabeth D. Liddy,et al.  Assessing Credibility of Weblogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[10]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[11]  D. R. Danielson,et al.  How do users evaluate the credibility of Web sites?: a study with over 2,500 participants , 2003, DUX '03.

[12]  Katsumi Tanaka,et al.  Enhancing credibility judgment of web search results , 2011, CHI.

[13]  B. J. Fogg,et al.  The elements of computer credibility , 1999, CHI '99.

[14]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[15]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[16]  Jiawei Han,et al.  Evaluating Event Credibility on Twitter , 2012, SDM.

[17]  Nicholas C. Burbules,et al.  Paradoxes of the Web: The Ethical Dimensions of Credibility , 2001, Libr. Trends.

[18]  Ling Liu,et al.  Countering web spam with credibility-based link analysis , 2007, PODC '07.

[19]  B. J. Fogg,et al.  Prominence-interpretation theory: explaining how people assess credibility online , 2003, CHI Extended Abstracts.

[20]  ChengXiang Zhai,et al.  Reliability Prediction of Webpages in the Medical Domain , 2012, ECIR.