Predicting webpage credibility using linguistic features

The article focuses on predicting trustworthiness from textual content of webpages. The recent work Olteanu et al. proposes a number of features (linguistic and social) to apply machine learning methods to recognize trust levels. We demonstrate that this approach can be substantially improved in two ways: by applying machine learning methods to vectors computed, using psychosocial and psycholinguistic features and in a high-dimensional bag-of-words paradigm of word occurrences. Following Olteanu et al., we test the methods in two classification settings, as a 2-class and 3-class scenario, and in a regression setting. In the 3-class scenario, the features compiled by Olteanu et al. achieve weighted precision of 0.63, while the methods proposed in our paper raise it to 0.66 and 0.70. We also examine coefficients of the models in order to discover words associated with low and high trust.

[1]  Edward F. Kelly,et al.  Computer recognition of English word senses , 1975 .

[2]  Adam Wierzbicki,et al.  Temporal, Cultural and Thematic Aspects of Web Credibility , 2013, SocInfo.

[3]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[4]  Clark Leavitt,et al.  The Persuasive Effect of Source Credibility: Tests of Cognitive Response , 1978 .

[5]  D. R. Danielson,et al.  How do users evaluate the credibility of Web sites?: a study with over 2,500 participants , 2003, DUX '03.

[6]  Gordon L. Patzer,et al.  Source credibility as a function of communicator physical attractiveness , 1983 .

[7]  Deborah G. Johnson Ethics online , 1997 .

[8]  J. Sobel A Theory of Credibility , 1985 .

[9]  William Allen,et al.  The influence of source credibility on communication effectiveness , 1953 .

[10]  Meredith Ringel Morris,et al.  Augmenting web pages and search results to support credibility assessment , 2011, CHI.

[11]  Katherine Del Giudice Crowdsourcing credibility: The impact of audience feedback on Web page credibility , 2010, ASIST.

[12]  C. Gaziano,et al.  Measuring the Concept of Credibility , 1986 .

[13]  Chanthika Pornpitakpan The Persuasiveness of Source Credibility: A Critical Review of Five Decades' Evidence , 2004 .

[14]  B. J. Fogg,et al.  The elements of computer credibility , 1999, CHI '99.

[15]  B. J. Fogg,et al.  Prominence-interpretation theory: explaining how people assess credibility online , 2003, CHI Extended Abstracts.

[16]  Karl Aberer,et al.  Web Credibility: Features Exploration and Credibility Prediction , 2013, ECIR.

[17]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[18]  B. J. Fogg,et al.  Credibility and computing technology , 1999, CACM.

[19]  Scott Counts,et al.  Tweeting is believing?: understanding microblog credibility perceptions , 2012, CSCW.

[20]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .