Cultural Orientation: Classifying Subjective Documents by Cociation Analysis

This paper introduces a simple method for estimating cultural orientation, the affiliation of hypertext documents in a polarized field of discourse. Using a probabilistic model based on cocitation information, two experiments are reported. The first experiment tests the model’s ability to discriminate between leftand right-wing documents about politics. In this context the model is tested on three sets of data, 695 partisan web documents, 162 political weblogs, and 72 non-partisan documents. Accuracy above 90% is obtained from the cocitation model, outperforming lexically based classifiers at statistically significant levels. In the second experiment, the proposed method is used to classify the home pages of musical artists with respect to their mainstream or “alternative” appeal. For musical artists the model is tested on a set of 227 artist home pages, achieving 88 % accuracy.

[1]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[2]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[3]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[4]  Trevor J. Hastie,et al.  The Sentimental Factor: Improving Review Classification Via Human-Provided Information , 2004, ACL.

[5]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[6]  Jonathan Furner,et al.  Scholarly communication and bibliometrics , 2005, Annu. Rev. Inf. Sci. Technol..

[7]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[8]  Hsinchun Chen,et al.  Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering , 2004, TOIS.

[9]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[10]  Ramakrishnan Srikant,et al.  Mining newsgroups using networks arising from social behavior , 2003, WWW '03.

[11]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[14]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[15]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[16]  C. Borgman,et al.  Scholarly Communication and Bibliometrics. , 1992 .

[17]  LinChih-Jen,et al.  A tutorial on -support vector machines , 2005 .

[18]  Michael L. Littman,et al.  Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.

[19]  Joseph A. Konstan,et al.  Introduction to recommender systems: Algorithms and Evaluation , 2004, TOIS.

[20]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[21]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.