Theory-Grounded Computational Text Analysis

In this position paper, we argue that computational text analysis lacks and requires organizing principles. A broad space separates its two constituent disciplines—natural language processing and social science—which has to date been sidestepped rather than filled by applying increasingly complex computational models to problems in social science research. We contrast descriptive and integrative findings, and our review of approximately 60 papers on computational text analysis reveals that those from *ACL venues are typically descriptive. The lack of theory began at the area’s inception and has over the decades, grown more important and challenging. A return to theoretically grounded research questions will propel the area from both theoretical and methodological points of view.

[1]  G. Dore A Natural Language Processing Analysis of Newspapers Coverage of Hong Kong Protests Between 1998 and 2020 , 2023, Social Indicators Research.

[2]  Martijn Schoonvelde,et al.  Three Gaps in Computational Text Analysis Methods for Social Sciences: A Research Agenda , 2021, Communication Methods and Measures.

[3]  Katherine A. Keith,et al.  Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond , 2021, Transactions of the Association for Computational Linguistics.

[4]  Arya D. McCarthy,et al.  Characterizing News Portrayal of Civil Unrest in Hong Kong, 1998–2020 , 2021, CASE.

[5]  Matthew J. Salganik,et al.  Integrating explanation and prediction in computational social science , 2021, Nature.

[6]  Laura K. Nelson Leveraging the alignment between machine learning and intersectionality: Using word embeddings to measure intersectional experiences of the nineteenth century U.S. South , 2021 .

[7]  Dan Jurafsky,et al.  Utility Is in the Eye of the User: A Critique of NLP Leaderboard Design , 2020, EMNLP.

[8]  Dan Jurafsky,et al.  Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks , 2020, AERA Open.

[9]  G. Dore,et al.  Problems of Measurement of the Relationship Between Civil Society and Democracy When Using Survey Data , 2020 .

[10]  Kenneth Ward Church Emerging trends: Reviewing the reviewers (again) , 2020, Natural Language Engineering.

[11]  Jason Radford,et al.  Theory In, Theory Out: The Uses of Social Theory in Machine Learning for Social Science , 2020, Frontiers in Big Data.

[12]  Maria Liakata,et al.  How We Do Things With Words: Analyzing Text as Social and Cultural Data , 2019, Frontiers in Artificial Intelligence.

[13]  G. Dore The long shadow of dictatorships beyond the near East: authoritarian legacies and citizens' democratic attitudes in Indonesia, Myanmar and Thailand , 2019 .

[14]  James Zou,et al.  Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings , 2019, NAACL.

[15]  Yulia Tsvetkov,et al.  Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies , 2018, EMNLP.

[16]  Zachary C. Lipton,et al.  Troubling Trends in Machine Learning Scholarship , 2018, ACM Queue.

[17]  David A. Siegel Analyzing Computational Models , 2018, American Journal of Political Science.

[18]  Philip Resnik,et al.  Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings , 2018, CLPsych@NAACL-HTL.

[19]  Wouter van Atteveldt,et al.  When Communication Meets Computation: Opportunities, Challenges, and Pitfalls in Computational Communication Science , 2018 .

[20]  Hanna M. Wallach Computational social science ≠ computer science + social data , 2018, Commun. ACM.

[21]  D. Lazer,et al.  Data ex Machina: Introduction to Big Data , 2017 .

[22]  Richard Biernacki Erratum: How to do things with historical texts , 2015 .

[23]  L. Spillman Ghosts of straw men: A reply to Lee and Martin , 2015 .

[24]  J. Armstrong,et al.  Derivation of Theory by Means of Factor Analysis or Tom Swift and His Electric Factor Analysis Machine , 2015 .

[25]  D. McGregor Shining a Light on the Shadow , 2015 .

[26]  Monica M. Lee,et al.  Coding, counting and cultural cartography , 2015 .

[27]  Noah A. Smith,et al.  Tracking the Development of Media Frames within and across Policy Issues , 2014 .

[28]  C. Bail The cultural environment: measuring culture with big data , 2014, Theory and Society.

[29]  D. Blei,et al.  Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding , 2013 .

[30]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[31]  Margaret E. Roberts,et al.  How Censorship in China Allows Government Criticism but Silences Collective Expression , 2013, American Political Science Review.

[32]  Daniel Jurafsky,et al.  Towards a Computational History of the ACL: 1980-2008 , 2012, Discoveries@ACL.

[33]  Brendan T. O'Connor,et al.  Censorship and deletion practices in Chinese social media , 2012, First Monday.

[34]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[35]  Lada A. Adamic,et al.  Computational Social Science , 2009, Science.

[36]  Eva Hajicová,et al.  Some of Our Best Friends Are Statisticians , 2007, TSD.

[37]  G. Dore,et al.  Urban Transition in Mongolia: Pursuing Sustainability in a Unique Environment , 2006 .

[38]  W. Rapaport Philosophy of Computer Science: An Introductory Course , 2005 .

[39]  D. Thompson Moderator , 2005, Proceedings of the ASIL Annual Meeting.

[40]  Frederick Jelinek,et al.  Some of my Best Friends are Linguists , 2005, Lang. Resour. Evaluation.

[41]  J. Gerring What Is a Case Study and What Is It Good for? , 2004, American Political Science Review.

[42]  Andrew Delano Abbott,et al.  Methods of Discovery: Heuristics for the Social Sciences , 2004 .

[43]  Sidney Verba,et al.  The Citizen as Respondent: Sample Surveys and American Democracy Presidential Address, American Political Science Association, 1995 , 1996, American Political Science Review.

[44]  Henry E. Brady,et al.  Voice and Equality: Civic Voluntarism in American Politics , 1996 .

[45]  S. Tarrow Bridging the Quantitative-Qualitative Divide in Political Science , 1995, American Political Science Review.

[46]  Kay Lehman Schlozman,et al.  Citizen Activity: Who Participates? What Do They Say? , 1993, American Political Science Review.

[47]  Duncan Snidal,et al.  Rational Deterrence Theory and Comparative Case Studies , 1989, World Politics.

[48]  J. Coleman Social Theory, Social Research, and a Theory of Action , 1986, American Journal of Sociology.

[49]  A. Tversky,et al.  Decision, probability, and utility: Prospect theory: An analysis of decision under risk , 1979 .

[50]  Drew McDermott,et al.  Artificial intelligence meets natural stupidity , 1976, SGAR.

[51]  F. Mosteller,et al.  Inference in an Authorship Problem , 1963 .

[52]  Philip J. Stone,et al.  A computer approach to content analysis: studies using the General Inquirer system , 1963, AFIPS Spring Joint Computing Conference.

[53]  Kiran Garimella,et al.  Social Media Narratives across Platforms in Conflict: Evidence from Syria , 2022, SSRN Electronic Journal.

[54]  Arya D. McCarthy,et al.  Hong Kong: Longitudinal and Synchronic Characterisations of Protest News between 1998 and 2020 , 2022, LREC.

[55]  Arya D. McCarthy,et al.  A Mixed-Methods Analysis of Western and Hong Kong–based Reporting on the 2019–2020 Protests , 2021, LATECHCLFL.

[56]  G. Dore,et al.  Asia's Democracy Puzzle: Five Uneasy Pieces , 2020 .

[57]  G. Dore The Informal Economy: Who Wins, Who Loses and Why We Care , 2019 .

[58]  Yvonne Herz,et al.  The Structure Of Science Problems In The Logic Of Scientific Explanation , 2016 .

[59]  Chris Arney Nudge: Improving Decisions about Health, Wealth, and Happiness , 2015 .

[60]  David Bamman,et al.  A Bayesian Mixed Effects Model of Literary Character , 2014, ACL.

[61]  G. Dore,et al.  Incomplete democracies in the Asia-Pacific : evidence from Indonesia, Korea, the Philippines, and Thailand , 2014 .

[62]  Lawrence Sáez A republic if you can keep it , 2011 .

[63]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[64]  Cliff B. Jones,et al.  Essays in computing science , 1989 .

[65]  Paul Thagard Computational Philosophy of Science , 1988 .

[66]  R. Thaler Judgement And Decision Making Under Uncertainty: What Economists Can Learn From Psychology , 1980 .