Automated Credibility Assessment of Web Page Based on Genre

With more than a billion web sites, volume and variety of content available for consumption is huge. However, credibility, an important quality characteristic of web pages is questionable in many cases and tends to be non-uniform. Credibility can increase or reduce the importance of web page leading to potential gain or loss of user base. Credibility without factoring genre of content (for example, Help, Article, Discussion, etc.) can lead to incorrect assessment. Depending on the genre, the importance of features such as web page date time modified, grammar, image to text ratio, in and out links, and other web page features differ. We propose a genre credibility assessment based on web page surface features and their importance in a genre. Further, we built a WEBCred framework to assess GCS (Genre based Credibility Score) with flexibility to add/modify genres, its features and their importance. We validated our approach on 10,429 ‘Information Security’ related web pages; the assessed score correlated 35% with crowd sourced Web Of Trust (WOT) score and 39% with Alexa ranking.

[1]  Claudia Keser,et al.  Can We Manage Trust? , 2005, iTrust.

[2]  Marti A. Hearst,et al.  Statistical profiles of highly-rated web sites , 2002, CHI.

[3]  Soo Young Rieh,et al.  Developing a unifying framework of credibility assessment: Construct, heuristics, and interaction in context , 2008, Inf. Process. Manag..

[4]  Laura O'Grady,et al.  Future directions for depicting credibility in health care web sites , 2006, Int. J. Medical Informatics.

[5]  Karl Aberer,et al.  Web Credibility: Features Exploration and Credibility Prediction , 2013, ECIR.

[6]  Megan Oakleaf,et al.  Writing Information Literacy Assessment Plans: A Guide to Best Practice , 2010 .

[7]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[8]  B. J. Fogg,et al.  The elements of computer credibility , 1999, CHI '99.

[9]  Richard Power,et al.  Implementing a Characterization of Genre for Automatic Genre Identification of Web Pages , 2006, ACL.

[10]  Bipin Indurkhya,et al.  Providing Web Credibility Assessment Support , 2014, ECCE.

[11]  C. I. Hovland,et al.  The Influence of Source Credibility on Communication Effectiveness , 1951 .

[12]  Guangyu Chen,et al.  Web page genre classification , 2008, SAC '08.

[13]  Jacquelyn A. Burkell,et al.  Believe it or not: Factors influencing credibility on the Web , 2002, J. Assoc. Inf. Sci. Technol..

[14]  Vasudeva Varma,et al.  Fine Grained Approach for Domain Specific Seed URL Extraction , 2018, HICSS.

[15]  S. Sundar The MAIN Model : A Heuristic Approach to Understanding Technology Effects on Credibility , 2007 .

[16]  Joseph C. McDonald A look back and a look forward , 2012 .

[17]  Irene Pollach,et al.  Electronic Word of Mouth: A Genre Analysis of Product Reviews on Consumer Opinion Web Sites , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[18]  Benno Stein,et al.  Genre Classification of Web Pages , 2004, KI.

[19]  Miriam J. Metzger,et al.  Credibility and trust of information in online environments: The use of cognitive heuristics , 2013 .

[20]  Kevin Crowston,et al.  Reproduced and Emergent Genres of Communication on the World Wide Web , 2000, Inf. Soc..

[21]  Vasyl Pihur,et al.  RankAggreg, an R package for weighted rank aggregation , 2009, BMC Bioinformatics.

[22]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[23]  D. R. Danielson,et al.  How do users evaluate the credibility of Web sites?: a study with over 2,500 participants , 2003, DUX '03.

[24]  Katsumi Tanaka,et al.  Enhancing credibility judgment of web search results , 2011, CHI.

[25]  Brent Auernheimer,et al.  Web site credibility: Why do people believe what they believe? , 2009 .

[26]  Jonathan Lazar,et al.  Understanding Web Credibility: A Synthesis of the Research Literature , 2007, Found. Trends Hum. Comput. Interact..