"What else are you worried about?" – Integrating textual responses into quantitative social science research

Open-ended questions have routinely been included in large-scale survey and panel studies, yet there is some perplexity about how to actually incorporate the answers to such questions into quantitative social science research. Tools developed recently in the domain of natural language processing offer a wide range of options for the automated analysis of such textual data, but their implementation has lagged behind. In this study, we demonstrate straightforward procedures that can be applied to process and analyze textual data for the purposes of quantitative social science research. Using more than 35,000 textual answers to the question “What else are you worried about?” from participants of the German Socio-economic Panel Study (SOEP), we (1) analyzed characteristics of respondents that determined whether they answered the open-ended question, (2) used the textual data to detect relevant topics that were reported by the respondents, and (3) linked the features of the respondents to the worries they reported in their textual data. The potential uses as well as the limitations of the automated analysis of textual data are discussed.

[1]  David R. Mayhew Congress: The Electoral Connection , 1975 .

[2]  Julie Evans,et al.  ``Is There Anything Else You Would Like to Tell Us'' – Methodological Issues in the Use of Free-Text Comments from Postal Surveys , 2004 .

[3]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[4]  Gary King,et al.  Extracting Systematic Social Science Meaning from Text 1 , 2007 .

[5]  J. Schupp,et al.  Short assessment of the Big Five: robust across survey methods except telephone interviewing , 2011, Behavior research methods.

[6]  Ian Witten,et al.  Data Mining , 2000 .

[7]  Gary King,et al.  ReadMe: Software for Automated Content Analysis , 2010 .

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Alicia O'Cathain,et al.  "Any other comments?" Open questions on questionnaires – a bane or a bonus to research? , 2004, BMC medical research methodology.

[10]  C. F. Hockett The origin of speech. , 1960, Scientific American.

[11]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[12]  P. Mayring Qualitative Content Analysis , 2000 .

[13]  Gregory J. Park,et al.  Gaining insights from social media language: Methodologies and challenges. , 2016, Psychological methods.

[14]  Christian Biemann,et al.  ASV Toolbox: a Modular Collection of Language Exploration Tools , 2008, LREC.

[15]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[16]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[17]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  K Bock,et al.  Language production: Methods and methodologies , 1996, Psychonomic bulletin & review.

[19]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[20]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[21]  Ingo Feinerer Introduction to the tm Package Text Mining in R , 2007 .

[22]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[23]  Richard E. Lucas,et al.  Unemployment Alters the Set Point for Life Satisfaction , 2004, Psychological science.

[24]  K. Malterud Qualitative research: standards, challenges, and guidelines , 2001, The Lancet.

[25]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[26]  Pete LePage,et al.  version 4.0 , 2014 .

[27]  Gert G. Wagner,et al.  The German Socio-Economic Panel Study (SOEP) - Evolution, Scope and Enhancements , 2007 .

[28]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[29]  Martijn van de Pol,et al.  A simple method for distinguishing within- versus between-subject effects using mixed models , 2009, Animal Behaviour.

[30]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[31]  M. Sandelowski,et al.  On Quantitizing , 2009, Journal of mixed methods research.

[32]  L. Bovens,et al.  Measuring common standards and equal responsibility-sharing in EU asylum outcome data , 2012 .

[33]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[34]  Carina Jacobi,et al.  Quantitative analysis of large amounts of journalistic texts using topic modelling , 2016, Rethinking Research Methods in an Age of Digital Journalism.

[35]  Mehl,et al.  Automatic Text Analysis , 2010 .

[36]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[37]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[38]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[39]  David Buttler,et al.  Exploring Topic Coherence over Many Models and Many Topics , 2012, EMNLP.

[40]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[41]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[42]  Thomas Eckart,et al.  Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages , 2012, LREC.

[43]  John Kitchener Sakaluk,et al.  Exploring Small, Confirming Big: An alternative system to The New Statistics for advancing cumulative and replicable psychological research , 2016 .

[44]  Hsiu-Fang Hsieh,et al.  Three Approaches to Qualitative Content Analysis , 2005, Qualitative health research.

[45]  C. Chojenta,et al.  Quality, Rigour and Usefulness of Free-Text Comments Collected by a Large Population Based Longitudinal Study - ALSWH , 2013, PloS one.

[46]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[47]  Claire Cardie,et al.  39. Opinion mining and sentiment analysis , 2014 .

[48]  J. Bromberg,et al.  A normed measure of variability among proportions , 1988 .

[49]  Gary King,et al.  General purpose computer-assisted clustering and conceptualization , 2011, Proceedings of the National Academy of Sciences.