Using Twitter for Demographic and Social Science Research: Tools for Data Collection and Processing

Despite recent and growing interest in using Twitter to examine human behavior and attitudes, there is still significant room for growth regarding the ability to leverage Twitter data for social science research. In particular, gleaning demographic information about Twitter users—a key component of much social science research—remains a challenge. This article develops an accurate and reliable data processing approach for social science researchers interested in using Twitter data to examine behaviors and attitudes, as well as the demographic characteristics of the populations expressing or engaging in them. Using information gathered from Twitter users who state an intention to not vote in the 2012 presidential election, we describe and evaluate a method for processing data to retrieve demographic information reported by users that is not encoded as text (e.g., details of images) and evaluate the reliability of these techniques. We end by assessing the challenges of this data collection strategy and discussing how large-scale social media data may benefit demographic researchers.

[1]  Isabell M. Welpe,et al.  Election Forecasts with Twitter - How 140 Characters Reflect the Political Landscape , 2011 .

[2]  J. Brownstein,et al.  Monitoring Online Discussions About Suicide Among Twitter Users With Schizophrenia: Exploratory Study , 2018, JMIR mental health.

[3]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[4]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[5]  Jon A. Krosnick,et al.  Social desirability bias in voter turnout reports Tests using the item count technique , 2010 .

[6]  P. West,et al.  Zones of Practice: Embodiment and Creative Arts Research , 2012 .

[7]  Mung Chiang,et al.  Why watching movie tweets won't tell the whole story? , 2012, WOSN '12.

[8]  N. Heaivilin,et al.  Public Health Surveillance of Dental Pain via Twitter , 2011, Journal of dental research.

[9]  Danah Boyd,et al.  I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience , 2011, New Media Soc..

[10]  Scott A. Golder,et al.  Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures , 2011 .

[11]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[12]  Haixun Wang,et al.  Guest Editorial: Big Social Data Analysis , 2014, Knowl. Based Syst..

[13]  A. Pentland,et al.  Life in the network: The coming age of computational social science: Science , 2009 .

[14]  N. Breslow Design and analysis of case-control studies. , 1982, Annual review of public health.

[15]  R. Tourangeau,et al.  Sensitive questions in surveys. , 2007, Psychological bulletin.

[16]  Hila Becker,et al.  Hip and trendy: Characterizing emerging trends on Twitter , 2011, J. Assoc. Inf. Sci. Technol..

[17]  Michael Brooks,et al.  Not by the book. , 2005, New scientist.

[18]  Eric A. Weiss,et al.  Association for computing machinery (ACM) , 2003 .

[19]  Bertrand De Longueville,et al.  "OMG, from here, I can see the flames!": a use case of mining location based social networks to acquire spatio-temporal data on forest fires , 2009, LBSN '09.

[20]  Helmut Krcmar,et al.  Big Data , 2014, Wirtschaftsinf..

[21]  V. Plaut,et al.  The cultural grounding of personal relationship: the importance of attractiveness in everyday life. , 2008, Journal of personality and social psychology.

[22]  Tara S. Behrend,et al.  The viability of crowdsourcing for survey research , 2011, Behavior research methods.

[23]  D. Umberson,et al.  The impact of physical attractiveness on achievement and psychological well-being. , 1987 .

[24]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[25]  N. Breslow,et al.  Statistics in Epidemiology : The Case-Control Study , 2008 .

[26]  Robert F. Belli,et al.  Reducing vote overreporting in surveys : Social desirability, memory failure, and source monitoring , 1999 .

[27]  Todd Rogers,et al.  Why Bother Asking? The Limited Value of Self-Reported Vote Intention , 2011 .

[28]  David S. Lassen,et al.  Twitter: The Electoral Connection? , 2011 .

[29]  Elena Faccio,et al.  The presentation of self in everyday prison life , 2013 .

[30]  Jes A. Koepfler,et al.  Studying the values of hard-to-reach populations: content analysis of tweets by the 21st century homeless , 2012, iConference '12.

[31]  Margaret Barnes,et al.  Investigating the use of social media to help women from going back to smoking post‐partum , 2012, Australian and New Zealand journal of public health.

[32]  G SandnerPhilipp,et al.  Election Forecasts With Twitter , 2011 .

[33]  Emma S. Spiro,et al.  #drunktwitter: Examining the relations between alcohol-related Twitter content and alcohol willingness and use among underage young adults. , 2018, Drug and alcohol dependence.

[34]  Jacob Ratkiewicz,et al.  Political Polarization on Twitter , 2011, ICWSM.

[35]  A. Janus,et al.  The Influence of Social Desirability Pressures on Expressed Immigration Attitudes , 2010 .

[36]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[37]  Axel Bruns,et al.  Twitter archives and the challenges of "Big Social Data" for media and communication research , 2012 .

[38]  Mónica Marrero,et al.  Crowdsourcing Preference Judgments for Evaluation of Music Similarity Tasks , 2010 .

[39]  E. Goffman The Presentation of Self in Everyday Life , 1959 .

[40]  Ulf-Dietrich Reips,et al.  Mining twitter: A source for psychological wisdom of the crowds , 2011, Behavior research methods.

[41]  D. Boyd,et al.  Dynamic Debates: An Analysis of Group Polarization Over Time on Twitter , 2010 .

[42]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[43]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[44]  J. Brownstein,et al.  Twitter as a Sentinel in Emergency Situations: Lessons from the Boston Marathon Explosions , 2013, PLoS currents.

[45]  Scott Andrew Golder Social Science with Social Media , 2017 .

[46]  Michael F Fleming,et al.  College Students’ Alcohol Displays on Facebook: Intervention Considerations , 2012, Journal of American college health : J of ACH.

[47]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[48]  B. Hogan The Presentation of Self in the Age of Social Media: Distinguishing Performances and Exhibitions Online , 2010 .

[49]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[50]  Sean Fitzhugh,et al.  Connected communications: Network structures of official communications in a technological disaster , 2012, ISCRAM.

[51]  Olivia J. Walch,et al.  Geographically Resolved Rhythms in Twitter Use Reveal Social Pressures on Daily Activity Patterns , 2018, Current Biology.

[52]  C. B. Bhutta Not by the Book Facebook as a Sampling Frame , 2012 .

[53]  Christopher Steven Marcum,et al.  Unanticipated Gains: The Origins of Network Inequality in Everyday Life , 2010 .

[54]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[55]  S. Kiesler,et al.  Portraits of American Internet UseFindings from the Pew Internet and American Life Project , 2006 .

[56]  Patti M. Valkenburg,et al.  Friend Networking Sites and Their Relationship to Adolescents' Well-Being and Social Self-Esteem , 2006, Cyberpsychology Behav. Soc. Netw..

[57]  Alexander I. Rudnicky,et al.  Using the Amazon Mechanical Turk for transcription of spoken language , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.