An Information Nutritional Label for Online Documents

With the proliferation of online information sources, it has become more and more difficult to judge the trustworthiness of news found on the Web. The beauty of the web is its openness, but this openness has lead to a proliferation of false and unreliable information, whose presentation makes it difficult to detect. It may be impossible to detect what is “real news” and what is “fake news” since this discussion ultimately leads to a deep philosophical discussion of what is true and what is false. However, recent advances in natural language processing allow us to analyze information objectively according to certain objective criteria (for example, the number of spelling errors). Here we propose creating an “information nutrition label” that we can automatically generated for any online text. Among others, the label provides information on the following computable criteria: factuality, virality, opinion, controversy, authority, technicality, and topicality. With this label, we hope to help readers make more informed judgments about the items they read.

[1]  Tie-Yan Liu,et al.  BrowseRank: letting web users vote for page importance , 2008, SIGIR '08.

[2]  Xiuzhen Zhang,et al.  On the credibility perception of news on Twitter: Readers, topics and features , 2017, Comput. Hum. Behav..

[3]  Kevyn Collins-Thompson,et al.  Computational Assessment of Text Readability: A Survey of Current and Future Research Running title: Computational Assessment of Text Readability , 2014 .

[4]  Adam Wierzbicki,et al.  Understanding and predicting Web content credibility using the Content Credibility Corpus , 2017, Inf. Process. Manag..

[5]  Fabio Massimo Zanzotto,et al.  Terminology Extraction: An Analysis of Linguistic and Statistical Approaches , 2005 .

[6]  Wei-Hao Lin,et al.  Which Side are You on? Identifying Perspectives at the Document and Sentence Levels , 2006, CoNLL.

[7]  Benno Stein,et al.  A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[8]  Krishna P. Gummadi,et al.  On the Wisdom of Experts vs. Crowds: Discovering Trustworthy Topical News in Microblogs , 2016, CSCW.

[9]  James Pustejovsky,et al.  FactBank: a corpus annotated with event factuality , 2009, Lang. Resour. Evaluation.

[10]  D. McNamara,et al.  Assessing Text Readability Using Cognitively Based Indices , 2008 .

[11]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[12]  Sanja Stajner,et al.  Readability Indices for Automatic Evaluation of Text Simplification Systems: A Feasibility Study for Spanish , 2013, IJCNLP.

[13]  Yejin Choi,et al.  Event Detection and Factuality Assessment with Non-Expert Supervision , 2015, EMNLP.

[14]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[15]  Stefan Stieglitz,et al.  Emotions and Information Diffusion in Social Media—Sentiment of Microblogs and Sharing Behavior , 2013, J. Manag. Inf. Syst..

[16]  Iryna Gurevych,et al.  Cross-Genre and Cross-Domain Detection of Semantic Uncertainty , 2012, CL.

[17]  Markus Helfert,et al.  Information Quality Management: Review of an Evolving Research Area , 2007 .

[18]  Ponnurangam Kumaraguru,et al.  TweetCred: A Real-time Web-based System for Assessing Credibility of Content on Twitter , 2014, ArXiv.

[19]  Lijun Feng,et al.  A Comparison of Features for Automatic Readability Assessment , 2010, COLING.

[20]  Brian D. Davison,et al.  Topical TrustRank: using topicality to combat web spam , 2006, WWW '06.

[21]  Arkaitz Zubiaga,et al.  Newsworthiness and Network Gatekeeping on Twitter: The Role of Social Deviance , 2014, ICWSM.

[22]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[23]  Martine De Cock,et al.  Using the crowd for readability prediction , 2012, Natural Language Engineering.

[24]  James Pustejovsky,et al.  Are You Sure That This Happened? Assessing the Factuality Degree of Events in Text , 2012, CL.

[25]  Thomas François,et al.  An analysis of a French as a Foreign Language Corpus for Readability Assessment , 2014 .

[26]  Behrang QasemiZadeh,et al.  The ACL RD-TEC 2.0 , 2016 .

[27]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[28]  Markus Helfert,et al.  Form Input Validation - An Empirical Study on Irish Corporate Websites , 2008, ICEIS.

[29]  Chantal van Son,et al.  MEANTIME, the NewsReader Multilingual Event and Time Corpus , 2016, LREC.

[30]  Andrés Montoyo,et al.  Detecting implicit expressions of emotion in text: A comparative analysis , 2012, Decis. Support Syst..

[31]  Maciej Ogrodniczuk,et al.  Measuring Readability of Polish Texts: Baseline Experiments , 2014, LREC.

[32]  Abdel Karim Al Tamimi,et al.  AARI: automatic arabic readability index , 2014, Int. Arab J. Inf. Technol..

[33]  Ido Dagan,et al.  Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets , 2017, ACL.

[34]  Markus Helfert,et al.  IQ Management: Review of an Evolving Research Area , 2007, ICIQ.

[35]  Gerhard Weikum,et al.  Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media , 2017, WWW.

[36]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[37]  Benno Stein,et al.  Predicting quality flaws in user-generated content: the case of wikipedia , 2012, SIGIR '12.

[38]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[39]  Barbara S. Chaparro,et al.  Comparing the effects of text size and format on the readibility of computer-displayed Times New Roman and Arial text , 2003, Int. J. Hum. Comput. Stud..

[40]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[41]  Richard H. Hall,et al.  The impact of web page text-background colour combinations on readability, retention, aesthetics and behavioural intention , 2004, Behav. Inf. Technol..

[42]  Ido Dagan,et al.  TruthTeller: Annotating Predicate Truth , 2013, NAACL.

[43]  Daniel Jurafsky,et al.  Linguistic Models for Analyzing and Detecting Biased Language , 2013, ACL.

[44]  S. Jay Samuels,et al.  Readability: Its Past, Present, and Future , 1988 .