Proceedings of the 4 th Workshop on Making Sense of Microposts ( # Microposts 2014 ) Big things come in small packages

According to the Computational Social Science Society of the Americas (CSSSA), computational social science is “The science that investigates social phenomena through the medium of computing and related advanced information processing technologies”. Positioned between the computer and social sciences, this new and emerging interdisciplinary field is fuelled by at least the following two developments: (i) availability of data: With the web, a huge volume of social data is now available which enables the study of traces of social interactions on new scales. (ii) increasing quantification of social theories: With recent advances in the social sciences, social theories become increasingly formal and/or mathematical and thus amenable to quantification. Taken together, these two developments give rise to a whole range of new and interesting problems on the intersection between computer and social sciences. While a multitude of social data is available on the World Wide Web, microblogs are of particular interest due to their real-time nature, their rich social fabric and their presumed on/offline coupling. In this talk, I am going to talk about the potentials and the challenges of doing computational social science based on data obtained from microblogs such as Twitter. In particular, I want to present previous work by my group and others to identify research avenues where progress has already been made or where progress is on the horizon, and contrast these with what I feel are open research challenges in this emerging field. Work that demonstrates the potential of microblogs for computational social science includes for example [1], where we have operationalized a number of theoretical constructs from sociology to characterize the nature of online conversational practices of political parties on Twitter. In another work, we have studied the ways in which users’ fields of expertise can be inferred from microblog data [4]. Work that demonstrates the pitfalls and challenges of doing computational social science with microblog data include for example [5] where we have studied a network of bots who are competing against each other in attacking users on Twitter. In subsequent work, we have found that such attacks have the potential to impact the social graph of Twitter [3], i.e. the network of who follows whom respectively who replies to whom. In other work, [2] have shown that there is a stark difference between the demographics of Twitter and the general population of the US, finding that Twitter users significantly over-represent densely populated regions and are predominantly male. I will argue that these and other factors need to be considered when we aim to unlock the full potential of microblog data for computational social science

[1]  Stefan Dlugolinsky,et al.  Evaluation of named entity recognition tools on microposts , 2013, 2013 IEEE 17th International Conference on Intelligent Engineering Systems (INES).

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Asif Ekbal,et al.  Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition , 2013, Data Knowl. Eng..

[4]  Philip S. Yu,et al.  Adding the temporal dimension to search - a case study in publication search , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[5]  Saso Dzeroski,et al.  Combining Classifiers with Meta Decision Trees , 2003, Machine Learning.

[6]  Ryan Cotterell,et al.  Nerit: Named Entity Recognition for Informal Text , 2013 .

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[9]  Aba-Sah Dadzie,et al.  Making Sense of Microposts (#MSM2013) Concept Extraction Challenge , 2013, #MSM.

[10]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.

[11]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[12]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[13]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[14]  Yang Song,et al.  Topical Keyphrase Extraction from Twitter , 2011, ACL.

[15]  Kalina Bontcheva,et al.  Making sense of social media streams through semantics: A survey , 2014, Semantic Web.

[16]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[17]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[18]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[19]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[20]  Furu Wei,et al.  HyperSum: hypergraph based semi-supervised sentence ranking for query-oriented summarization , 2009, CIKM.

[21]  Elaine Marsh,et al.  MUC-7 Evaluation of IE Technology: Overview of Results , 1998, MUC.

[22]  Luo Si,et al.  Boosting performance of bio-entity recognition by combining results from multiple systems , 2005, BIOKDD.

[23]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[24]  Maurice van Keulen,et al.  Concept Extraction Challenge: University of Twente at #MSM2013 , 2013, #MSM.