Text mining and social media: when quantitative meets qualitative, and software meets humans

The ongoing production of staggeringly huge volumes of digital data is a ubiquitous part of life in the early twenty-first century. A large proportion of this data is text. This development has serious implications for almost all scholarly endeavour. It is now possible for researchers from a wide range of disciplines to use text mining techniques and software tools in their daily practice. In our own field of political communication, the prospect of cheap access to what, how, and to whom very large numbers of citizens communicate in social media environments provides opportunities that are too good to miss as we seek to understand how and why citizens think and feel the way they do about policies, political organizations, and political events. But what are the methods and tools on offer, how should they best be used, and what sorts of ethical issues are raised by their use?

[1]  Karen Sparck Jones Natural Language Processing: A Historical Review , 1994 .

[2]  Sang-goo Lee,et al.  Opinion mining of customer feedback data on the web , 2008, ICUIMC '08.

[3]  Tiffany C. Veinot,et al.  “The Eyes of the Power Company”: Workplace Information Practices of a Vault Inspector1 , 2007, The Library Quarterly.

[4]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[5]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[6]  Ben O'Loughlin,et al.  Emerging viewertariat: explaining twitter responses to Nick Griffin’s appearance on BBC Question Time , 2010 .

[7]  N. Anstead,et al.  Semantic polling: the ethics of online public opinion , 2012 .

[8]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[9]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10]  Lada A. Adamic,et al.  The role of social networks in information diffusion , 2012, WWW.

[11]  Caitlin Evans Wagner,et al.  The hybrid media system: Politics and power , 2014, New Media Soc..

[12]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[13]  Huina Mao Indiana Computational Economic and Finance Gauges: Polls, Search, & Twitter , 2011 .

[14]  K. Crawford Think Again: Big Data , 2013 .

[15]  Daniel A. Keim,et al.  Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election in 2008 , 2009 .

[16]  Paul W. Jeffreys The Developing Conception of e-Research , 2010, World Wide Research.

[17]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[18]  C. Hine,et al.  How can qualitative internet researchers define the boundaries of their projects , 2008 .

[19]  N. Thrift The insubstantial pageant: producing an untoward land , 2012 .

[20]  Hector Garcia-Molina,et al.  Extracting structured data from Web pages , 2003, SIGMOD '03.

[21]  D. Karpf SOCIAL SCIENCE RESEARCH METHODS IN INTERNET TIME , 2012 .

[22]  M. Callon,et al.  On Qualculation, Agency, and Otherness , 2005 .

[23]  Arun Sundararajan,et al.  Opinion Mining using Econometrics: A Case Study on Reputation Systems , 2007, ACL.

[24]  Nancy K. Baym,et al.  A Call for Grounding in the Face of Blurred Boundaries , 2009, J. Comput. Mediat. Commun..

[25]  A. Chadwick The Political Information Cycle in a Hybrid News System: The British Prime Minister and the “Bullygate” Affair , 2011 .

[26]  A. Hoskins,et al.  Mobilisation and violence in the new media ecology: the Dua Khalil Aswad and Camilia Shehata cases , 2012 .

[27]  A. Chadwick Britain's First Live Televised Party Leaders’ Debate: From the News Cycle to the Political Information Cycle , 2011 .

[28]  Graham R. Gibbs,et al.  The Use of New Technology in Qualitative Research , 2002 .

[29]  Bernardo A. Huberman,et al.  Rhythms of social interaction: messaging within a massive online network , 2006, ArXiv.

[30]  Marco Gonzalez,et al.  Author's Personal Copy Social Networks Tastes, Ties, and Time: a New Social Network Dataset Using Facebook.com , 2022 .

[31]  Kazutoshi Sumiya,et al.  Crowd-Powered TV Viewing Rates: Measuring Relevancy between Tweets and TV Programs , 2011, DASFAA Workshops.

[32]  Steven L. Puller,et al.  The Old Boy (and Girl) Network: Social Network Formation on University Campuses , 2008 .

[33]  Gilad Mishne,et al.  Predicting Movie Sales from Blogger Sentiment , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[34]  R. Procter,et al.  Reading the riots on Twitter: methodological innovation for the analysis of big data , 2013 .

[35]  Fay Sudweeks,et al.  How Do You Get a Hundred Strangers to Agree: Computer mediated communication and collaboration , 1996 .

[36]  J. Brownstein,et al.  Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. , 2012, The American journal of tropical medicine and hygiene.

[37]  Matthijs den Besten,et al.  Key Digital Technologies to Deal with Data , 2010, World Wide Research.

[38]  Son Doan,et al.  An analysis of Twitter messages in the 2011 Tohoku Earthquake , 2011, eHealth.

[39]  Jordan Lefler,et al.  I can has thesis? a linguistic analysis of lolspeak , 2011 .

[40]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[41]  Christopher M. Anderson,et al.  The web is dead. Long live the Internet , 2010 .

[42]  Kazutoshi Sumiya,et al.  Towards better TV viewing rates: exploiting crowd's media life logs over Twitter for TV rating , 2011, ICUIMC '11.

[43]  Ck Cheng,et al.  The Age of Big Data , 2015 .

[44]  N. Anstead,et al.  The Emerging Viewertariat and BBC Question Time , 2011 .

[45]  Kalev Leetaru,et al.  Culturomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space , 2011, First Monday.

[46]  Lincoln Dahlberg The Corporate Colonization of Online Attention and the Marginalization of Critical Communication? , 2005 .

[47]  William H. Dutton Reconfiguring Access in Research: Information, Expertise, and Experience , 2010, World Wide Research.

[48]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[49]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.