Identifying ideological perspectives in text and video

Polarizing opinions about political and social controversies take place commonly in mass and more recently user-generated media. A functional democratic society builds on civic discussions among people holding different beliefs on an issue. However, so far, few computer technologies have been devoted to facilitate mutual understanding, and arguably could have worsened the situation. We envision a computer system that can automatically understand different ideological viewpoints on an issue and identify biased news stories, blog posts, and television news. Such a computer system will raise news readers' awareness of individual sources' biases and encourage them to seek news stories from different viewpoints. · Computer understanding of ideological perspectives, however, has been long considered almost impossible. In this thesis, we show that ideology, although very abstract, exhibits a concrete pattern when it is communicated among a group of people who share similar beliefs in written text, spoken text, television news production, and web video folksonomies. This emphatic pattern in ideological discourse opens up a new field of automatic ideological analysis, and enables a large amount of ideological text and video to be automatically analyzed. · We develop a new statistical model, called Joint Topic and Perspective Models, based on the emphatic pattern in ideological discourse. The model combines two essential aspects of ideological discourse: topic matters and ideological biases. The simultaneous inference on topics and ideological emphasis, however, poses a computational challenge. We thus develop an approximate inference algorithm for the model based on variational methods. · The emphatic pattern in ideological discourse and the Joint Topic and Perspective Model enable many interesting applications in text analysis and multimedia content understanding. At the corpus level, we show that ideological discourse can be reliably distinguished from non-ideological discourse. At the document level, we show that the perspective from which a document is written or a video is produced can be identified with high accuracy. At the sentence level, we extend the model to summarize an ideological document by selecting sentences that strongly express a particular perspective.

[1]  Satoshi Morinaga,et al.  Mining product reputations on the Web , 2002, KDD.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[4]  Warren Sack,et al.  Actor-Role Analysis: Ideology, Point of View, and the News * , 1994 .

[5]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[6]  Amr Ahmed,et al.  On Tight Approximate Inference of the Logistic-Normal Topic Admixture Model , 2007 .

[7]  Wei-Hao Lin,et al.  Which Side are You on? Identifying Perspectives at the Document and Sentence Levels , 2006, CoNLL.

[8]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9]  Zachary J. Mason CorMet: A Computational, Corpus-Based Conventional Metaphor Extraction System , 2004, CL.

[10]  Luis-Felipe Cabrera,et al.  AI Gets a Brain , 2006, ACM Queue.

[11]  Chong-Wah Ngo,et al.  Novelty detection for cross-lingual news stories with visual duplicates and speech transcripts , 2007, ACM Multimedia.

[12]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[13]  Michael J. Muller,et al.  Getting our head in the clouds: toward evaluation studies of tagclouds , 2007, CHI.

[14]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Sameer Pradhan,et al.  Proceedings of the 5th Linguistic Annotation Workshop , 2011 .

[18]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Joshua Michah Marshall Which Side Are You On , 2005 .

[21]  Timothy R. C. Read,et al.  Goodness-Of-Fit Statistics for Discrete Multivariate Data , 1988 .

[22]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[23]  Roger C. Schank,et al.  Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .

[24]  Jaime G. Carbonell,et al.  POLITICS: Automated Ideological Reasoning , 1978, Cogn. Sci..

[25]  Trevor J. Hastie,et al.  The Sentimental Factor: Improving Review Classification Via Human-Provided Information , 2004, ACL.

[26]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[27]  Mauricio Arango,et al.  Vanishing point , 2005, ACM Multimedia.

[28]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[29]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[30]  Sara Kristiina Elo,et al.  PLUM : contextualizing news for communities through augmentation , 1995 .

[31]  T. V. Dijk,et al.  Ideology: A Multidisciplinary Approach , 1998 .

[32]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[33]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[34]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[35]  Robert,et al.  Computer Simulation of Individual Belief Systems * , .

[36]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[37]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[38]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[39]  G. Lakoff Metaphor and War: The Metaphor System Used to Justify War in the Gulf , 1992 .

[40]  Yun Yang,et al.  Computational Mechanisms for Metaphor in Languages: A Survey , 2007, Journal of Computer Science and Technology.

[41]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[42]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[43]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[44]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[45]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[46]  E. Efron,et al.  The news twisters , 1971 .

[47]  J. Carbonell Subjective understanding, computer models of belief systems , 1981 .

[48]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[49]  Janyce Wiebe,et al.  Tracking Point of View in Narrative , 1994, Comput. Linguistics.

[50]  Steven Kull,et al.  Misperceptions, the Media, and the Iraq War , 2003 .

[51]  Paul D. Over,et al.  TREC Video Retrieval Evaluation Website | NIST , 2000 .

[52]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[53]  Massimo Poesio,et al.  Bias decreases in proportion to the number of annotators , 2005 .

[54]  Joseph G. Ibrahim,et al.  Monte Carlo Methods in Bayesian Computation , 2000 .

[55]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[56]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[57]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[58]  Chien-Ho Wu An Empirical Study on the Transformation of Likert-scale Data to Numerical Scores , 2007 .

[59]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.

[60]  Vincent Ng,et al.  Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews , 2006, ACL.

[61]  Wei-Hao Lin,et al.  Do These News Videos Portray a News Event from Different Ideological Perspectives? , 2008, 2008 IEEE International Conference on Semantic Computing.

[62]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[63]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[64]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[65]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[66]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[67]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[68]  Frederick Hayes-Roth,et al.  Building expert systems , 1983, Advanced book program.

[69]  Theo van Leeuwen,et al.  Reading Images: The Grammar of Visual Design , 1996 .

[70]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[71]  Jun Yang,et al.  CMU Informedia's TRECVID 2005 Skirmishes , 2005, TRECVID.

[72]  Shih-Fu Chang,et al.  Semantic video clustering across sources using bipartite spectral clustering , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[73]  Frederick Mosteller,et al.  Applied Bayesian and classical inference : the case of the Federalist papers , 1984 .

[74]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[75]  H. Kelley,et al.  Communication And Persuasion , 1953 .

[76]  John-Charles Wilson Politically Speaking: The Pragmatic Analysis of Political Language , 1990 .

[77]  Beth Sundheim,et al.  Overview of the Third Message Understanding Evaluation and Conference , 1991, MUC.

[78]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[79]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[80]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[81]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[82]  Dan Fass,et al.  met*: A Method for Discriminating Metonymy and Metaphor by Computer , 1991, CL.

[83]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[84]  Susan Wittig,et al.  A poetics of composition : the structure of the artistic text and typology of a compositional form , 1973 .

[85]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[86]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[87]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[88]  Martha W. Driver,et al.  New York City, Columbia University-Union Theological , 2007 .

[89]  George Lakoff Metaphor and War, Again , 2003 .

[90]  R. Abelson,et al.  Computer Simulation of Individual Belief Systems1 , 1965 .

[91]  Michael Mateas,et al.  Generation of Ideologically-Biased Historical Documentaries , 2000, AAAI/IAAI.

[92]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[93]  Roger C. Schank,et al.  Computer Models of Thought and Language , 1974 .

[94]  A. S. C. Ehrenberg,et al.  Multivariate Correlational Analysis. , 1959 .

[95]  Larry Wasserman,et al.  All of Statistics , 2004 .

[96]  Alexander G. Hauptmann,et al.  Towards a Large Scale Concept Ontology for Broadcast Video , 2004, CIVR.

[97]  D. G. Beech,et al.  Multivariate Correlational Analysis , 1959 .

[98]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[99]  Jaime G. Carbonell,et al.  Interactive drama, art and artificial intelligence , 2002 .

[100]  Mubarak Shah,et al.  Tracking news stories across different sources , 2005, MULTIMEDIA '05.

[101]  Andrew McCallum,et al.  The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email , 2005 .