The liberal media and right-wing conspiracies: using cocitation information to estimate political orientation in web documents

This paper introduces a simple method for estimating cultural orientation, the affiliation of online entities in a polarized field of discourse. In particular, cocitation information is used to estimate the political orientation of hypertext documents. A type of cultural orientation, the political orientation of a document is the degree to which it participates in traditionally left- or right-wing beliefs. Estimating documents' political orientation is of interest for personalized information retrieval and recommender systems. In its application to politics, the method uses a simple probabilistic model to estimate the strength of association between a document and left- and right-wing communities. The model estimates the likelihood of cocitation between a document of interest and a small number of documents of known orientation. The model is tested on three sets of data, 695 partisan web documents, 162 political weblogs, and 72 non-partisan documents. Accuracy above 90% is obtained from the cocitation model, outperforming lexically based classifiers at statistically significant levels.

[1]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[2]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[6]  Trevor J. Hastie,et al.  The Sentimental Factor: Improving Review Classification Via Human-Provided Information , 2004, ACL.

[7]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[8]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[9]  Ramakrishnan Srikant,et al.  Mining newsgroups using networks arising from social behavior , 2003, WWW '03.

[10]  Maximino Aldana-Gonzalez,et al.  Linked: The New Science of Networks , 2003 .

[11]  Peter D. Turney Thumbs Up, Thumbs Down , 2013, Journal of Cell Science.

[12]  Ben Shneiderman,et al.  Identifying aggregates in hypertext structures , 1991, HYPERTEXT '91.

[13]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[14]  Duncan J. Watts,et al.  Six Degrees: The Science of a Connected Age , 2003 .

[15]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[16]  Albert-László Barabási,et al.  Linked: The New Science of Networks , 2002 .

[17]  LinChih-Jen,et al.  A tutorial on -support vector machines , 2005 .

[18]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[19]  Michael L. Littman,et al.  Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.

[20]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[21]  Paul Solomon,et al.  Toward an Understanding of the Dynamics of Relevance Judgment: An Analysis of One Person's Search Behavior , 1998, Inf. Process. Manag..

[22]  Howard Rheingold,et al.  Smart Mobs: The Next Social Revolution , 2002 .

[23]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[24]  Linda Schamber Relevance and Information Behavior. , 1994 .

[25]  J. Jenkins,et al.  Word association norms , 1964 .