Twitter as Data

The rise of the internet and mobile telecommunications has created the possibility of using large datasets to understand behavior at unprecedented levels of temporal and geographic resolution. Online social networks attract the most users, though users of these new technologies provide their data through multiple sources, e.g. call detail records, blog posts, web forums, and content aggregation sites. These data allow scholars to adjudicate between competing theories as well as develop new ones, much as the microscope facilitated the development of the germ theory of disease. Of those networks, Twitter presents an ideal combination of size, international reach, and data accessibility that make it the preferred platform in academic studies. Acquiring, cleaning, and analyzing these data, however, require new tools and processes. This Element introduces these methods to social scientists and provides scripts and examples for downloading, processing, and analyzing Twitter data. 1 Department of Public Policy, University of California Los Angeles 337 Charles Young Drive East Los Angeles, CA 90095 ∗To whom correspondence should be addressed: zst at luskin dot ucla edu

[1]  C. B. Colby The weirdest people in the world , 1973 .

[2]  Doug McAdam Recruitment to High-Risk Activism: The Case of Freedom Summer , 1986, American Journal of Sociology.

[3]  G. Marwell,et al.  Social Networks and Collective Action: A Theory of the Critical Mass. III , 1988, American Journal of Sociology.

[4]  G. Marwell,et al.  A Theory of the Critical Mass , 1991 .

[5]  Robin I. M. Dunbar Neocortex size as a constraint on group size in primates , 1992 .

[6]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[7]  K. Opp,et al.  Dissident Groups, Personal Networks, and Spontaneous Cooperation: The East German Revolution of 1989 , 1993 .

[8]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[9]  Robin I. M. Dunbar Neocortex size and group size in primates: a test of the hypothesis , 1995 .

[10]  Robert Huckfeldt,et al.  Social Capital, Social Networks, and Political Participation , 1998 .

[11]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[12]  Patrick Ball,et al.  EXPLORING THE IMPLICATIONS OF SOURCE SELECTION IN THE CASE OF GUATEMALAN STATE TERROR, 1977-1995 , 2002 .

[13]  R. Huckfeldt,et al.  The Social Calculus of Voting: Interpersonal, Media, and Organizational Influences on Presidential Choices , 2002, American Political Science Review.

[14]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[15]  Didier Sornette,et al.  Discrete hierarchical organization of social group sizes , 2004, Proceedings of the Royal Society B: Biological Sciences.

[16]  Marco Gonzalez,et al.  Author's Personal Copy Social Networks Tastes, Ties, and Time: a New Social Network Dataset Using Facebook.com , 2022 .

[17]  David W. Nickerson Is Voting Contagious? Evidence from Two Field Experiments , 2008, American Political Science Review.

[18]  S. Herring,et al.  Beyond Microblogging: Conversation and Collaboration via Twitter , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[19]  A. Pentland,et al.  Computational Social Science , 2009, Science.

[20]  A. Pentland,et al.  Life in the network: The coming age of computational social science: Science , 2009 .

[21]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[22]  S. Levinson,et al.  WEIRD languages have misled us, too , 2010, Behavioral and Brain Sciences.

[23]  Eni Mustafaraj,et al.  From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search , 2010 .

[24]  Leysia Palen,et al.  Pass it on?: Retweeting in mass emergency , 2010, ISCRAM.

[25]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[26]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[27]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[28]  Nils B. Weidmann,et al.  Predicting Conflict in Space and Time , 2010 .

[29]  Danah Boyd,et al.  Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[30]  Robin I. M. Dunbar Constraints on the evolution of social institutions and their implications for information flow , 2010, Journal of Institutional Economics.

[31]  Danah Boyd,et al.  Tweeting from the Town Square: Measuring Geographic Local Networks , 2010, ICWSM.

[32]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[33]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[34]  Ed H. Chi,et al.  Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network , 2010, 2010 IEEE Second International Conference on Social Computing.

[35]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[36]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[37]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[38]  Christian Borgs,et al.  We know who you followed last summer: inferring social link creation times in twitter , 2011, WWW.

[39]  Christopher M. Danforth,et al.  Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter , 2011, PloS one.

[40]  T. Zeitzoff Using Social Media to Measure Conflict Dynamics , 2011 .

[41]  Kelly Bergstrom,et al.  "Don't feed the troll": Shutting down debate about community expectations on Reddit.com , 2011, First Monday.

[42]  Alexander Halavais Social science: Open up online research , 2011, Nature.

[43]  Barbara Poblete,et al.  Do all birds tweet the same?: characterizing twitter around the world , 2011, CIKM '11.

[44]  Joshua Evan Blumenstock Using mobile phone data to measure the ties between nations , 2011, iConference '11.

[45]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[46]  Salvatore Catanese,et al.  Crawling Facebook for social network analysis purposes , 2011, WIMS '11.

[47]  Chen Huang,et al.  Microblogging after a major disaster in China: a case study of the 2010 Yushu earthquake , 2011, CSCW.

[48]  Ed H. Chi,et al.  Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[49]  Babak Rahimi The Agonistic Social Media: Cyberspace in the Formation of Dissent and Consolidation of State Power in Postelection Iran , 2011 .

[50]  E. Doheny United States Agency for International Development , 2011 .

[51]  Jacob Ratkiewicz,et al.  Detecting and Tracking Political Abuse in Social Media , 2011, ICWSM.

[52]  T. Zeitzoff,et al.  Using Social Media to Measure Conflict Dynamics : An Application to the 2008 – 2009 Gaza Conflict , 2011 .

[53]  Jacob Ratkiewicz,et al.  Political Polarization on Twitter , 2011, ICWSM.

[54]  Alessandro Vespignani,et al.  Modeling Users' Activity on Twitter Networks: Validation of Dunbar's Number , 2011, PloS one.

[55]  Yamir Moreno,et al.  The Dynamics of Protest Recruitment through an Online Network , 2011, Scientific reports.

[56]  Scott A. Golder,et al.  Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures , 2011 .

[57]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[58]  Lars Backstrom,et al.  The Anatomy of the Facebook Social Graph , 2011, ArXiv.

[59]  D. Boyd,et al.  The Arab Spring| The Revolutions Were Tweeted: Information Flows during the 2011 Tunisian and Egyptian Revolutions , 2011 .

[60]  Filippo Menczer,et al.  Partisan asymmetries in online political activity , 2012, EPJ Data Science.

[61]  Cameron Marlow,et al.  A 61-million-person experiment in social influence and political mobilization , 2012, Nature.

[62]  Ning Wang,et al.  Assessing the Bias in Communication Networks Sampled from Twitter , 2012, ArXiv.

[63]  Daniel Gayo-Avello,et al.  A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter Data , 2012, ArXiv.

[64]  Emilio Ferrara,et al.  A large-scale community structure analysis in Facebook , 2011, EPJ Data Science.

[65]  Krishna P. Gummadi,et al.  Geographic Dissection of the Twitter Network , 2012, ICWSM.

[66]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[67]  Joshua E. Blumenstock,et al.  Information Technology for Development Inferring Patterns of Internal Migration from Mobile Phone Call Records: Evidence from Rwanda Inferring Patterns of Internal Migration from Mobile Phone Call Records: Evidence from Rwanda , 2022 .

[68]  H. Farrell The Consequences of the Internet for Politics , 2012 .

[69]  Yamir Moreno,et al.  Broadcasters and Hidden Influentials in Online Protest Diffusion , 2012, ArXiv.

[70]  Bernardo A. Huberman,et al.  Artificial Inflation: The Real Story of Trends and Trend-Setters in Sina Weibo , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[71]  Stephanie M. Reich,et al.  Friending, IMing, and hanging out face-to-face: overlap in adolescents' online and offline social networks. , 2012, Developmental psychology.

[72]  Lindsay T. Graham,et al.  A Review of Facebook Research in the Social Sciences , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[73]  Yong Yu,et al.  A comparative study of users' microblogging behavior on sina weibo and twitter , 2012, UMAP.

[74]  Jonathan Hassid Safety Valve or Pressure Cooker? Blogs in Chinese Political Life , 2012 .

[75]  Zeynep Tufekci,et al.  Social Media and the Decision to Participate in Political Protest: Observations From Tahrir Square , 2012 .

[76]  Deen Freelon,et al.  Introduction to the Special Issue on New Media and Social Unrest , 2013 .

[77]  Nils W. Metternich,et al.  Antigovernment networks in civil conflicts : how network structures affect conflictual behavior , 2013 .

[78]  Bethan Jones,et al.  From Usenet to Tumblr: the changing role of social media , 2013 .

[79]  Lev Manovich,et al.  Zooming into an Instagram City: Reading the local through social media , 2013, First Monday.

[80]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[81]  Xiaokang Yang,et al.  Analysis and identification of spamming behaviors in Sina Weibo microblog , 2013, SNAKDD '13.

[82]  O. J. Reuter,et al.  Online Social Media and Political Awareness in Authoritarian Regimes , 2013, British Journal of Political Science.

[83]  Jure Leskovec,et al.  What's in a Name? Understanding the Interplay between Titles, Content, and Communities in Social Media , 2013, ICWSM.

[84]  Jussara M. Almeida,et al.  A Picture of Instagram is Worth More Than a Thousand Words: Workload Characterization and Application , 2013, 2013 IEEE International Conference on Distributed Computing in Sensor Systems.

[85]  Christopher M. Danforth,et al.  Happiness and the Patterns of Life: A Study of Geolocated Tweets , 2013, Scientific Reports.

[86]  A. Stefanidis,et al.  Harvesting ambient geospatial information from social media feeds , 2011, GeoJournal.

[87]  Eric Gilbert,et al.  Widespread underprovision on Reddit , 2013, CSCW.

[88]  Alessandro Vespignani,et al.  The Twitter of Babel: Mapping World Languages through Microblogging Platforms , 2012, PloS one.

[89]  Matthew A. Shapiro,et al.  What's congress doing on twitter? , 2013, CSCW.

[90]  Erika Check Hayden,et al.  Guidance issued for US Internet research , 2013, Nature.

[91]  Filippo Menczer,et al.  The Geospatial Characteristics of a Social Movement Communication Network , 2013, PloS one.

[92]  Scott A. Hale,et al.  Where in the World Are You? Geolocation and Language Identification in Twitter* , 2013, ArXiv.

[93]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[94]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[95]  Hongyan Liu,et al.  Detecting Event Rumors on Sina Weibo Automatically , 2013, APWeb.

[96]  Venkata Rama Kiran Garimella,et al.  Secular vs. Islamist polarization in Egypt on Twitter , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[97]  Keiji Yanai,et al.  Visual event mining from geo-tweet photos , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[98]  Shaowen Wang,et al.  Mapping the global Twitter heartbeat: The geography of Twitter , 2013, First Monday.

[99]  Yan Liu,et al.  What is Tumblr: a statistical overview and comparison , 2014, SKDD.

[100]  Pablo Barberá How Social Media Reduces Mass Political Polarization. Evidence from Germany, Spain, and the U.S. , 2014 .

[101]  Zhigang Cao,et al.  Analyzing user behavior of the micro-blogging website Sina Weibo during hot social events , 2013, 1304.3898.

[102]  George Valkanas,et al.  Mining Twitter Data with Resource Constraints , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[103]  Jiejun Xu,et al.  Civil Unrest Prediction: A Tumblr-Based Exploration , 2014, SBP.

[104]  Andreas Jungherr Twitter in Politics: A Comprehensive Literature Review , 2014 .

[105]  Nils B. Weidmann,et al.  Using machine-coded event data for the micro-level study of political violence , 2014 .

[106]  Susan C. Herring,et al.  Multimodal communication on tumblr: "i have so many feels!" , 2014, WebSci '14.

[107]  C. Bail The cultural environment: measuring culture with big data , 2014, Theory and Society.

[108]  Daron Acemoglu,et al.  The Power of the Street: Evidence from Egypt&Apos;S Arab Spring , 2014 .

[109]  Aravind Srinivasan,et al.  'Beating the news' with EMBERS: forecasting civil unrest using open source indicators , 2014, KDD.

[110]  Michael Gamon,et al.  Online And Social Media Data As A Flawed Continuous Panel Survey , 2014 .

[111]  Nathan Kallus,et al.  Predicting crowd behavior with big public data , 2014, WWW.

[112]  Scott A. Golder,et al.  Digital Footprints: Opportunities and Challenges for Online Social Research , 2014 .

[113]  Heather K. Evans,et al.  Twitter Style: An Analysis of How House Candidates Used Twitter in Their 2012 Campaigns , 2014, PS: Political Science & Politics.

[114]  Subbarao Kambhampati,et al.  What We Instagram: A First Analysis of Instagram Photo Content and User Types , 2014, ICWSM.

[115]  Shankar Kalyanaraman,et al.  Violence and Cell Phone Communication: Behavior and Prediction in Cote D’Ivoire , 2014 .

[116]  Jeffrey T. Hancock,et al.  Experimental evidence of massive-scale emotional contagion through social networks , 2014, Proceedings of the National Academy of Sciences.

[117]  David G. Rand,et al.  Structural Topic Models for Open‐Ended Survey Responses , 2014, American Journal of Political Science.

[118]  Zeynep Tufekci,et al.  Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls , 2014, ICWSM.

[119]  Matthew S. Gerber,et al.  Predicting crime using Twitter and kernel density estimation , 2014, Decis. Support Syst..

[120]  Manuel Cebrián,et al.  Using Friends as Sensors to Detect Global-Scale Contagious Outbreaks , 2012, PloS one.

[121]  Andrea Tagarelli,et al.  Online popularity and topical interests through the lens of instagram , 2014, HT.

[122]  Margaret E. Roberts,et al.  Reverse-engineering censorship in China: Randomized experimentation and participant observation , 2014, Science.

[123]  Tomaso Aste,et al.  When Can Social Media Lead Financial Markets? , 2014, Scientific Reports.

[124]  Philip N. Howard,et al.  Political Bots and the Manipulation of Public Opinion in Venezuela , 2015, ArXiv.

[125]  Pablo Barberá Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data , 2015, Political Analysis.

[126]  Dan Mercea,et al.  Tents, Tweets, and Events: The Interplay Between Ongoing Protests and Social Media , 2015 .

[127]  M. Shigematsu,et al.  Using Social Media for Actionable Disease Surveillance and Outbreak Management: A Systematic Literature Review , 2015, PloS one.

[128]  Michael Zimmer The Twitter Archive at the Library of Congress: Challenges for information practice and information policy , 2015, First Monday.

[129]  Jürgen Pfeffer,et al.  Population Bias in Geotagged Tweets , 2015, Proceedings of the International AAAI Conference on Web and Social Media.

[130]  T. Zeitzoff,et al.  Using social media to measure foreign policy dynamics : An empirical analysis of the Iranian – Israeli confrontation ( 2012 – 13 ) , 2015 .

[131]  Gabriel Cadamuro,et al.  Predicting poverty and wealth from mobile phone metadata , 2015, Science.

[132]  D. Watts,et al.  Dissecting the Spirit of Gezi: Influence vs. Selection in the Occupy Gezi Movement. , 2015 .

[133]  Margaret E. Roberts,et al.  Computer-Assisted Text Analysis for Comparative Politics , 2015, Political Analysis.

[134]  Alessandro Vespignani,et al.  Online social networks and offline protest , 2015, EPJ Data Science.

[135]  Dongjin Song,et al.  High resolution population estimates from telecommunications data , 2015, EPJ Data Science.

[136]  Lev Manovich,et al.  Predicting social trends from non-photographic images on Twitter , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[137]  Joshua A. Tucker,et al.  Is Online Political Communication More Than an Echo Chamber? , 2022 .

[138]  Nils B. Weidmann On the Accuracy of Media-based Conflict Event Data , 2015 .

[139]  Manuel Cebrián,et al.  Social Media Fingerprints of Unemployment , 2014, PloS one.

[140]  M. Williams,et al.  Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data , 2015, PloS one.

[141]  Marco Conti,et al.  The structure of online social networks mirrors those in the offline world , 2015, Soc. Networks.

[142]  Lada A. Adamic,et al.  Exposure to ideologically diverse news and opinion on Facebook , 2015, Science.

[143]  T. Zeitzoff,et al.  Using social media to measure foreign policy dynamics , 2015 .

[144]  O. Onuch EuroMaidan Protests in Ukraine: Social Media Versus Social Networks , 2015 .

[145]  Walid Magdy,et al.  Content and Network Dynamics Behind Egyptian Political Polarization on Twitter , 2014, CSCW.

[146]  Luke S Sloan,et al.  Who Tweets with Their Location? Understanding the Relationship between Demographic Characteristics and the Use of Geoservices and Geotagging on Twitter , 2015, PloS one.

[147]  Joshua A. Tucker,et al.  The Critical Periphery in the Growth of Social Protests , 2015, PloS one.

[148]  Shiry Ginosar,et al.  Photographic home styles in Congress: a computer vision approach , 2016, ArXiv.

[149]  What is Political Participation , 2016 .

[150]  Jonathan Ronen,et al.  Social Networks and Protest Participation: Evidence from 93 Million Twitter Users , 2016 .

[151]  A. Coppock,et al.  When Treatments are Tweets: A Network Mobilization Experiment over Twitter , 2016 .

[152]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[153]  Zachary C. Steinert-Threlkeld,et al.  Structure, Agency, Hegemony, and Action: Ukrainian Nationalism in East Ukraine , 2016 .

[154]  Emilio Ferrara,et al.  Social Bots Distort the 2016 US Presidential Election Online Discussion , 2016, First Monday.

[155]  Maeve Duggan,et al.  Social Media Update 2016 , 2016 .

[156]  Richard Bonneau,et al.  Big Data, Social Media, and Protest: Foundations for a Research Agenda , 2016, Computational Social Science.

[157]  Joann Cattlin,et al.  Simple online privacy for Australia , 2016, First Monday.

[158]  J. Pfeffer,et al.  A Macroscopic Analysis of News Content in Twitter , 2016 .

[159]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[160]  Michael Gamon,et al.  Online and Social Media Data As an Imperfect Continuous Panel Survey , 2016, PloS one.

[161]  Samuel C. Woolley,et al.  Automating power: Social bot interference in global politics , 2016, First Monday.

[162]  Zachary C. Steinert-Threlkeld Spontaneous Collective Action: Peripheral Mobilization During the Arab Spring , 2017, American Political Science Review.

[163]  Margaret E. Roberts,et al.  How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, Not Engaged Argument , 2017, American Political Science Review.

[164]  Zachary C. Steinert-Threlkeld Longitudinal Network Centrality Using Incomplete Data , 2017, Political Analysis.

[165]  Kevin Munger Tweetment Effects on the Tweeted: Experimentally Reducing Racist Harassment , 2017 .

[166]  Michael F. Goodchild,et al.  Location-Based Services , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[167]  T. Zeitzoff,et al.  Does Social Media Influence Conflict? Evidence from the 2012 Gaza Conflict , 2018 .

[168]  Nicholas Eubank Social Networks and the Political Salience of Ethnicity , 2019, Quarterly Journal of Political Science.