Book Reviews: Natural Language Processing for Social Media by Atefeh Farzindar and Diana Inkpen

Today, social media refers to a wide range of Web sites and Internet-based services that allow users to create content and interact with other users. Some of these tools, such as multi-party chats, discussion forums, blogs, and online reviews, have been a focus of natural language processing (NLP) research for quite some time now. But within the last decade, NLP work has expanded rapidly to cover an immense variety of new social media content—microblogs such as Twitter, social networks such as Facebook, comments on news articles, captions on user-contributed images such as on Flickr, and forums dedicated to specialized topics and needs (e.g., health and online education). Simultaneously, many other research communities are carrying out work using social media data—information science, information retrieval, network science, social media analytics, social science, psychology, and corpus linguistics. Today, a large number of businesses are also centered on, or benefit from, analytics performed on social media. Given these myriad research and commercial interests in the social media domain, we are at a time where we should seek to clearly understand what role NLP has in the field of social media analysis, both in terms of the key and interesting language questions, as well as contributions NLP can make to the research carried out in other fields. In this context, this short book by Farzindar and Inkpen is timely and exciting, filling an obvious gap.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Ming Zhou,et al.  Joint Inference of Named Entity Recognition and Normalization for Tweets , 2012, ACL.

[3]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[4]  Prem Melville Social Media Analytics: Channeling the Power of the Blogosphere for Marketing Insight , 2009 .

[5]  Chris Callison-Burch,et al.  Machine Translation of Arabic Dialects , 2012, NAACL.

[6]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[7]  Huan Liu,et al.  Provenance Data in Social Media , 2013, Synthesis Lectures on Data Mining and Knowledge Discovery.

[8]  M. de Rijke,et al.  Credibility Improves Topical Blog Post Retrieval , 2008, ACL.

[9]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[10]  Chris Callison-Burch,et al.  Arabic Dialect Identification , 2014, CL.

[11]  Sung-Hyon Myaeng,et al.  A Hybrid Mood Classification Approach for Blog Text , 2006, PRICAI.

[12]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[13]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[14]  Alan R. Dennis,et al.  Trading on Twitter: The Financial Information Content of Emotion in Social Media , 2014, 2014 47th Hawaii International Conference on System Sciences.

[15]  Preslav Nakov,et al.  SemEval-2015 Task 10: Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[16]  Nanyun Peng,et al.  Learning Polylingual Topic Models from Code-Switched Social Media Documents , 2014, ACL.

[17]  Kalina Bontcheva,et al.  TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text , 2013, RANLP.

[18]  Jens Grossklags,et al.  An online experiment of privacy authorization dialogues for social applications , 2013, CSCW.

[19]  Christopher M. Danforth,et al.  Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents , 2010, ArXiv.

[20]  M. de Rijke,et al.  Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts , 2011, ECIR.

[21]  Dong Nguyen,et al.  Word Level Language Identification in Online Multilingual Communication , 2013, EMNLP.

[22]  Miles Efron,et al.  Information search and retrieval in microblogs , 2011, J. Assoc. Inf. Sci. Technol..

[23]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[24]  Kate Starbird,et al.  Designing for the deluge: understanding & supporting the distributed, collaborative work of crisis volunteers , 2014, CSCW.

[25]  Nina Wacholder,et al.  Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[26]  ZhuXiaodan,et al.  Sentiment, emotion, purpose, and style in electoral tweets , 2015 .

[27]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.

[28]  Craig MacDonald,et al.  On choosing an effective automatic evaluation metric for microblog summarisation , 2014, IIiX.

[29]  H. Sawaf Arabic Dialect Handling in Hybrid Machine Translation , 2010, AMTA.

[30]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[31]  Haofen Wang,et al.  Towards Effective Event Detection, Tracking and Summarization on Microblog Data , 2011, WAIM.

[32]  Bo Pang,et al.  The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter , 2014, ACL.

[33]  Antske Fokkens,et al.  Offspring from Reproduction Problems: What Replication Failure Teaches Us , 2013, ACL.

[34]  Lars Backstrom,et al.  Find me if you can: improving geographical prediction with social and spatial proximity , 2010, WWW '10.

[35]  Eduard H. Hovy,et al.  Weakly Supervised User Profile Extraction from Twitter , 2014, ACL.

[36]  Fatiha Sadat,et al.  Automatic Identification of Arabic Language Varieties and Dialects in Social Media , 2014, SocialNLP@COLING.

[37]  Roberto Frias,et al.  Twitter event detection: combining wavelet analysis and topic inference summarization , 2011 .

[38]  Fred Popowich,et al.  Domain Adaptation Techniques for Machine Translation and Their Evaluation in a Real-World Setting , 2012, Canadian Conference on AI.

[39]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[40]  Fatiha Sadat,et al.  Automatic identification of arabic dialects in social media , 2014, SoMeRA@SIGIR.

[41]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[42]  Eduard H. Hovy,et al.  Structured Event Retrieval over Microblog Archives , 2012, NAACL.

[43]  Arjun Mukherjee,et al.  Detecting Campaign Promoters on Twitter Using Markov Random Fields , 2014, 2014 IEEE International Conference on Data Mining.

[44]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[45]  Timothy Baldwin,et al.  Collective Classification of Congressional Floor-Debate Transcripts , 2011, ACL.

[46]  Amit Kumar Agrawal,et al.  Sentiment Analysis of Wimbledon Tweets , 2014, #MSM.

[47]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[48]  A. Smeaton,et al.  On Using Twitter to Monitor Political Sentiment and Predict Election Results , 2011 .

[49]  Mor Naaman,et al.  Diamonds in the rough: Social media visual analytics for journalistic inquiry , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[50]  José M. Molina López,et al.  Combining Machine Learning Techniques and Natural Language Processing to Infer Emotions Using Spanish Twitter Corpus , 2013, PAAMS.

[51]  Chien Chin Chen,et al.  TSCAN: a novel method for topic summarization and content anatomy , 2008, SIGIR '08.

[52]  Patrick Paroubek,et al.  Twitter Based System: Using Twitter for Disambiguating Sentiment Ambiguous Adjectives , 2010, *SEMEVAL.

[53]  Robert G. Capra,et al.  Factors mediating disclosure in social network sites , 2011, Comput. Hum. Behav..

[54]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[55]  Yang Song,et al.  Topical Keyphrase Extraction from Twitter , 2011, ACL.

[56]  Wouter Weerkamp,et al.  Twitter hashtags: Joint Translation and Clustering , 2011 .

[57]  Jugal Kalita,et al.  Evaluating Methods for Summarizing Twitter Posts , 2011 .

[58]  Robert Power,et al.  A sensitive Twitter earthquake detector , 2013, WWW.

[59]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[60]  Amit P. Sheth,et al.  Spatio-Temporal-Thematic Analysis of Citizen Sensor Data: Challenges and Experiences , 2009, WISE.

[61]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[62]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[63]  Huina Mao Indiana Computational Economic and Finance Gauges: Polls, Search, & Twitter , 2011 .

[64]  Timothy Baldwin,et al.  Accurate Language Identification of Twitter Messages , 2014 .

[65]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[66]  Abdelghani Bellaachia,et al.  HG-Rank: A Hypergraph-based Keyphrase Extraction for Short Documents in Dynamic Genre , 2014, #MSM.

[67]  Ning Zhou,et al.  A Hybrid Probabilistic Model for Unified Collaborative and Content-Based Image Tagging , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  George M. Mohay,et al.  Computer and Intrusion Forensics , 2003 .

[69]  Stefan Riezler,et al.  Twitter Translation using Translation-Based Cross-Lingual Retrieval , 2012, WMT@NAACL-HLT.

[70]  Cyrus Shahabi,et al.  MediaQ: mobile multimedia management system , 2014, MMSys '14.

[71]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[72]  Dieter Fox,et al.  Bayesian Filtering for Location Estimation , 2003, IEEE Pervasive Comput..

[73]  Michael C. Frank,et al.  A Robust Framework for Estimating Linguistic Alignment in Twitter Conversations , 2016, WWW.

[74]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[75]  ChengXiang Zhai,et al.  Generating comparative summaries of contradictory opinions in text , 2009, CIKM.

[76]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[77]  Shamima Mithun,et al.  Exploiting Rhetorical Relations in Blog Summarization , 2010, Canadian Conference on AI.

[78]  Wouter Weerkamp,et al.  Microblog language identification: overcoming the limitations of short, unedited and idiomatic text , 2012, Language Resources and Evaluation.

[79]  Steve Y. Yang,et al.  An empirical study of the financial community network on Twitter , 2013, 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr).

[80]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[81]  David Ratcliffe,et al.  Finding Fires with Twitter , 2013, ALTA.

[82]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[83]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[84]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[85]  Timothy Baldwin,et al.  Automatic Identification of Locative Expressions from Social Media Text: A Comparative Analysis , 2014, LocWeb '14.

[86]  Defeng Guo,et al.  Enhanced stock prediction using social network and statistical model , 2014, 2014 IEEE Workshop on Advanced Research and Technology in Industry Applications (WARTIA).

[87]  Hila Becker,et al.  Automatic Identification and Presentation of Twitter Content for Planned Events , 2011, ICWSM.

[88]  Sanda M. Harabagiu,et al.  Relevance Modeling for Microblog Summarization , 2011, ICWSM.

[89]  Lluís F. Hurtado,et al.  Political Tendency Identification in Twitter using Sentiment Analysis Techniques , 2014, COLING.

[90]  Hal Daumé,et al.  Domain Adaptation for Machine Translation by Mining Unseen Words , 2011, ACL.

[91]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[92]  Wen-Tai Hsieh,et al.  Predicting TV Audience Rating with Social Media , 2013, SocialNLP@IJCNLP.

[93]  James Allan,et al.  Detections , Bounds , and Timelines : UMass and TDT-3 , 2000 .

[94]  Li Wang,et al.  How Noisy Social Media Text, How Diffrnt Social Media Sources? , 2013, IJCNLP.

[95]  Johan Bos,et al.  Predicting the 2011 Dutch Senate Election Results with Twitter , 2012 .

[96]  Jason Baldridge,et al.  Hierarchical Discriminative Classification for Text-Based Geolocation , 2014, EMNLP.

[97]  Craig H. Martell,et al.  Lexical and Discourse Analysis of Online Chat Dialog , 2007, International Conference on Semantic Computing (ICSC 2007).

[98]  Kazutoshi Sumiya,et al.  Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection , 2010, LBSN '10.

[99]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[100]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[101]  Kareem Darwish,et al.  Using Twitter to Collect a Multi-Dialectal Corpus of Arabic , 2014, ANLP@EMNLP.

[102]  Jeffrey Nichols,et al.  Home Location Identification of Twitter Users , 2014, TIST.

[103]  Timothy Baldwin,et al.  Classifying Dialogue Acts in One-on-One Live Chats , 2010, EMNLP.

[104]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[105]  Kalina Bontcheva,et al.  Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.

[106]  Lilja Øvrelid,et al.  Lexical Categories for Improved Parsing of Web Data , 2012, COLING.

[107]  Jie Yin,et al.  Using Social Media to Enhance Emergency Situation Awareness , 2012, IEEE Intelligent Systems.

[108]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[109]  Joel D. Martin,et al.  Sentiment, emotion, purpose, and style in electoral tweets , 2015, Inf. Process. Manag..

[110]  Claire Grover,et al.  Re-using an Argument Corpus to Aid in the Curation of Social Media Collections , 2014, LREC.

[111]  Ralf D. Brown,et al.  Selecting and Weighting N-Grams to Identify 1100 Languages , 2013, TSD.

[112]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[113]  Henry A. Kautz,et al.  Modeling the impact of lifestyle on health at scale , 2013, WSDM.

[114]  Liang Zhou,et al.  On the Summarization of Dynamically Introduced Information: Online Discussions and Blogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[115]  Kalina Bontcheva,et al.  Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Recognising Person Entities in Tweets , 2014, EACL.

[116]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[117]  Farnoush Banaei-Kashani,et al.  GeoDec: A framework to visualize and query geospatial data for decision-making , 2010, IEEE MultiMedia.

[118]  Susan T. Dumais,et al.  Similarity Measures for Short Segments of Text , 2007, ECIR.

[119]  Danah Boyd,et al.  Social Network Sites: Definition, History, and Scholarship , 2007, J. Comput. Mediat. Commun..

[120]  G. H. Al-Gaphari,et al.  A Method to Convert Sana’ani Accent to Modern Standard Arabic , 2012 .

[121]  Jure Leskovec,et al.  No country for old members: user lifecycle and linguistic change in online communities , 2013, WWW.

[122]  Noah A. Smith,et al.  A Dependency Parser for Tweets , 2014, EMNLP.

[123]  Li Bing,et al.  Public Sentiment Analysis in Twitter Data for Prediction of a Company's Stock Price Movements , 2014, 2014 IEEE 11th International Conference on e-Business Engineering.

[124]  Wang Ling,et al.  Microblogs as Parallel Corpora , 2013, ACL.

[125]  Robert Munro Crowdsourced translation for emergency response in Haiti: the global collaboration of local knowledge , 2010, AMTA.

[126]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[127]  Rahma Sellami,et al.  Collaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Application - the case of Tunisian Arabic and the Social Media , 2014, LG-LP@COLING.

[128]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[129]  Xerxes P. Kotval,et al.  Visualization of entities within social media: Toward understanding users' needs , 2013, Bell Labs Technical Journal.

[130]  Gerhard Weikum,et al.  KORE: keyphrase overlap relatedness for entity disambiguation , 2012, CIKM.

[131]  Yitong Li,et al.  Graph-Based Multi-Tweet Summarization using Social Signals , 2012, COLING.

[132]  Timothy Baldwin,et al.  One Sense per Tweeter ... and Other Lexical Semantic Tales of Twitter , 2014, EACL.

[133]  Uta Lösch,et al.  Mapping microblog posts to encyclopedia articles , 2011, GI-Jahrestagung.

[134]  M. Bradley,et al.  Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings , 1999 .

[135]  Mike Thelwall,et al.  Sentiment in Twitter events , 2011, J. Assoc. Inf. Sci. Technol..

[136]  Dirk Thorleuchter,et al.  Protecting research and technology from espionage , 2013, Expert Syst. Appl..

[137]  Erdogan Dogdu,et al.  Named entity recognition and disambiguation using linked data and graph-based centrality scoring , 2012, SWIM '12.

[138]  Uwe F. Mayer Bootstrapped language identification for multi-site internet domains , 2012, KDD.

[139]  Georgios Zervas,et al.  Fake It Till You Make It: Reputation, Competition, and Yelp Review Fraud , 2015, Manag. Sci..

[140]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[141]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[142]  Markus Dickinson,et al.  Does Size Matter? Text and Grammar Revision for Parsing Social Media Data , 2013 .

[143]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[144]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[145]  Diana Inkpen,et al.  Textual risk mining for maritime situational awareness , 2014, 2014 IEEE International Inter-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA).

[146]  Peter Krammer,et al.  Combining Named Entity Recognition Methods for Concept Extraction in Microposts , 2014, #MSM.

[147]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[148]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[149]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[150]  Freddy Chong Tat Chua,et al.  Automatic Summarization of Events from Social Media , 2013, ICWSM.

[151]  William W. Cohen,et al.  What pushes their buttons? Predicting comment polarity from the content of political blog posts , 2011 .

[152]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[153]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[154]  Yiming Yang,et al.  Topic-conditioned novelty detection , 2002, KDD.

[155]  Stan Szpakowicz,et al.  Identifying Expressions of Emotion in Text , 2007, TSD.

[156]  Ellen Riloff,et al.  Sarcasm as Contrast between a Positive Sentiment and Negative Situation , 2013, EMNLP.

[157]  Hui Lin,et al.  Graph-based submodular selection for extractive summarization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[158]  Ed H. Chi,et al.  Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[159]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[160]  Diana Inkpen,et al.  General Topic Annotation in Social Networks: A Latent Dirichlet Allocation Approach , 2013, Canadian Conference on AI.

[161]  Michael S. Bernstein,et al.  Who gives a tweet?: evaluating microblog content value , 2012, CSCW.

[162]  Fred Popowich,et al.  Opinion Polarity Identification through Adjectives , 2010, ArXiv.

[163]  Fernando Diaz,et al.  Extracting information nuggets from disaster- Related messages in social media , 2013, ISCRAM.

[164]  Vincent Ng,et al.  Vote Prediction on Comments in Social Polls , 2014, EMNLP.

[165]  D. Maynard,et al.  Challenges in developing opinion mining tools for social media , 2012 .

[166]  Carlo Strapparava,et al.  SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[167]  Michael J. Paul,et al.  Summarizing Contrastive Viewpoints in Opinionated Text , 2010, EMNLP.

[168]  William Lewis,et al.  Haitian Creole: How to Build and Ship an MT Engine from Scratch in 4 days, 17 hours, & 30 minutes , 2010, EAMT.

[169]  Trevor Cohn,et al.  A user-centric model of voting intention from Social Media , 2013, ACL.

[170]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[171]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[172]  Diana Inkpen,et al.  Location detection and disambiguation from twitter messages , 2017, Journal of Intelligent Information Systems.

[173]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[174]  Jugal K. Kalita,et al.  Comparing Twitter Summarization Algorithms for Multiple Post Summaries , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[175]  Aleem Hossain,et al.  Crowded: a crowd-sourced perspective of events as they happen , 2013, Defense, Security, and Sensing.

[176]  Diana Inkpen,et al.  A hierarchical approach to mood classification in blogs , 2011, Natural Language Engineering.

[177]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[178]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[179]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[180]  Dale Schuurmans,et al.  Combining Naive Bayes and n-Gram Language Models for Text Classification , 2003, ECIR.

[181]  Ani Nahapetian,et al.  Tweet analysis for user health monitoring , 2014, 2014 4th International Conference on Wireless Mobile Communication and Healthcare - Transforming Healthcare Through Innovations in Mobile and Wireless Technologies (MOBIHEALTH).

[182]  Svitlana Volkova,et al.  Inferring User Political Preferences from Streaming Communications , 2014, ACL.

[183]  P. Ekman An argument for basic emotions , 1992 .

[184]  Trevor Cohn,et al.  Predicting and Characterising User Impact on Twitter , 2014, EACL.

[185]  Josef van Genabith,et al.  From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0 , 2011, IJCNLP.

[186]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[187]  Paul Barford,et al.  A Learning-Based Approach for IP Geolocation , 2010, PAM.

[188]  Xiaojie Yuan,et al.  Exploiting Social Media for Stock Market Prediction with Factorization Machine , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[189]  Ted E. Dunning,et al.  Statistical Identification of Language , 1994 .

[190]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[191]  Richard Colbaugh,et al.  Estimating sentiment orientation in social media for intelligence monitoring and analysis , 2010, 2010 IEEE International Conference on Intelligence and Security Informatics.

[192]  Steve Uhlig,et al.  IP geolocation databases: unreliable? , 2011, CCRV.

[193]  Gabriel Doyle,et al.  Mapping Dialectal Variation by Querying Social Media , 2014, EACL.

[194]  Kalina Bontcheva,et al.  Microblog-genre noise and impact on semantic annotation accuracy , 2013, HT.

[195]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.

[196]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[197]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[198]  Li Shang,et al.  ETree: Effective and Efficient Event Modeling for Real-Time Online Social Media Networks , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[199]  Jacob Eisenstein Systematic patterning in phonologically‐motivated orthographic variation , 2015 .

[200]  Wenyi Huang,et al.  Inferring nationalities of Twitter users and studying inter-national linking , 2014, HT.

[201]  Marilyn A. Walker,et al.  Really? Well. Apparently Bootstrapping Improves the Performance of Sarcasm and Nastiness Classifiers for Online Dialogue , 2013, ArXiv.

[202]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[203]  A. Culotta,et al.  Using County Demographics to Infer Attributes of Twitter Users , 2014 .

[204]  Zheng Chen,et al.  Study of Stock Prediction Based on Social Network , 2013, 2013 International Conference on Social Computing.

[205]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[206]  Stephan Vogel,et al.  Language Model Adaptation for Statistical Machine Translation via Structured Query Models , 2004, COLING.

[207]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[208]  Philippe Langlais,et al.  Hashtag Occurrences, Layout and Translation: A Corpus-driven Analysis of Tweets Published by the Canadian Government , 2014, LREC.

[209]  Laura Elisabeth Jehl,et al.  Machine Translation for Twitter , 2010 .

[210]  James Caverlee,et al.  Summarizing User-Contributed Comments , 2011, ICWSM.

[211]  Mona T. Diab,et al.  COLABA : Arabic Dialect Annotation and Processing , 2011 .

[212]  David A. Shamma,et al.  Tweetgeist : Can the Twitter Timeline Reveal the Structure of Broadcast Events ? , 2009 .

[213]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[214]  Philippe Langlais,et al.  Translating Government Agencies’ Tweet Feeds: Specificities, Problems and (a few) Solutions , 2013 .

[215]  Lamia Hadrich Belguith,et al.  Mapping Rules for Building a Tunisian Dialect Lexicon and Generating Corpora , 2013, IJCNLP.

[216]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[217]  Khaled Shaalan,et al.  A Hybrid Approach for Converting Written Egyptian Colloquial Dialect into Diacritized Arabic , 2008 .

[218]  Deepa Paranjpe,et al.  Learning document aboutness from implicit user feedback and document structure , 2009, CIKM.

[219]  Fabio Celli Unsupervised Personality Recognition for Social Network Sites , 2012, ICDS 2012.

[220]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[221]  Shannon Vallor,et al.  Social Networking and Ethics , 2012 .

[222]  T. Murata,et al.  Breaking News Detection and Tracking in Twitter , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[223]  Mykola Pechenizkiy,et al.  Graph-Based N-gram Language Identication on Short Texts , 2011 .

[224]  Hailiang Chen,et al.  Sentiment revealed in social media and its effect on the stock market , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[225]  Theodoros Tzouramanis,et al.  A robust gender inference model for online social networks and its application to LinkedIn and Twitter , 2014, First Monday.

[226]  Nizar Habash,et al.  MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects , 2006, ACL.

[227]  David Yarowsky,et al.  Hierarchical Bayesian Models for Latent Attribute Detection in Social Media , 2011, ICWSM.

[228]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[229]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[230]  Lynette Hirschman,et al.  Effects of personal identifier resynthesis on clinical text de-identification , 2010, J. Am. Medical Informatics Assoc..

[231]  Diana Inkpen,et al.  Prior and contextual emotion of words in sentential context , 2014, Comput. Speech Lang..

[232]  Hila Becker,et al.  Identifying content for planned events across social media sites , 2012, WSDM '12.

[233]  Xiaohua Hu,et al.  Text Mining the Biomedical Literature for Identification of Potential Virus/Bacterium as Bio-Terrorism Weapons , 2008, Terrorism Informatics.

[234]  Nathanael Chambers,et al.  Learning for Microblogs with Distant Supervision: Political Forecasting with Twitter , 2012, EACL.

[235]  Cristian Danescu-Niculescu-Mizil,et al.  Brighter than Gold: Figurative Language in User Generated Comparisons , 2014, EMNLP.

[236]  A. Stefanidis,et al.  Harvesting ambient geospatial information from social media feeds , 2011, GeoJournal.

[237]  Claire Cardie,et al.  Sentiment analysis on evolving social streams: how self-report imbalances can help , 2014, WSDM.

[238]  Khaled Shaalan,et al.  Transferring Egyptian Colloquial Dialect into Modern Standard Arabic , 2007 .

[239]  Hila Becker,et al.  Selecting Quality Twitter Content for Events , 2011, ICWSM.

[240]  Mona T. Diab,et al.  Sentence Level Dialect Identification in Arabic , 2013, ACL.

[241]  Yaser Al-Onaizan,et al.  Improved Sentence-Level Arabic Dialect Classification , 2014, VarDial@COLING.

[242]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[243]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[244]  Regina Barzilay,et al.  Event Discovery in Social Media Feeds , 2011, ACL.

[245]  Xun Wang,et al.  Update Summarization using a Multi-level Hierarchical Dirichlet Process Model , 2012, COLING.

[246]  Peter Ford Dominey,et al.  Learning Word Meaning and Grammatical Constructions from Narrated Video Events , 2003, HLT-NAACL 2003.

[247]  Kamalakar Karlapalem,et al.  TEA: Episode Analytics on Short Messages , 2014, #MSM.

[248]  Hila Becker,et al.  Hip and trendy: Characterizing emerging trends on Twitter , 2011, J. Assoc. Inf. Sci. Technol..

[249]  Ari Rappoport,et al.  Semi-Supervised Recognition of Sarcasm in Twitter and Amazon , 2010, CoNLL.

[250]  Susan T. Dumais,et al.  Mark my words!: linguistic style accommodation in social media , 2011, WWW.

[251]  Cecilia Ovesdotter Alm,et al.  Emotions from Text: Machine Learning for Text-based Emotion Prediction , 2005, HLT.

[252]  Horacio Saggion,et al.  Modelling Sarcasm in Twitter, a Novel Approach , 2014, WASSA@ACL.

[253]  William M. Pottenger,et al.  Classification of Emotions in Internet Chat: An Application of Machine Learning Using Speech Phonemes , 2003 .

[254]  Jon Oberlander,et al.  Whose Thumb Is It Anyway? Classifying Author Personality from Weblog Text , 2006, ACL.

[255]  Mirella Lapata,et al.  Tweet Recommendation with Graph Co-Ranking , 2012, ACL.

[256]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[257]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[258]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[259]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[260]  Aron Culotta,et al.  Inferring the origin locations of tweets with quantitative confidence , 2013, CSCW.

[261]  Julio Gonzalo,et al.  Towards real-time summarization of scheduled events from twitter streams , 2012, HT '12.

[262]  Claire Cardie,et al.  Major Life Event Extraction from Twitter based on Congratulations/Condolences Speech Acts , 2014, EMNLP.

[263]  Anoop Sarkar,et al.  Mixing Multiple Translation Models in Statistical Machine Translation , 2012, ACL.

[264]  Stan Szpakowicz,et al.  Hierarchical versus Flat Classification of Emotions in Text , 2010, HLT-NAACL 2010.

[265]  Claire Cardie,et al.  Towards a General Rule for Identifying Deceptive Opinion Spam , 2014, ACL.

[266]  Alexander Porshnev,et al.  Machine Learning in Prediction of Stock Market Indicators Based on Historical Data and Data from Twitter Sentiment Analysis , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[267]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[268]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[269]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[270]  Jugal K. Kalita,et al.  Experiments in Microblog Summarization , 2010, 2010 IEEE Second International Conference on Social Computing.

[271]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[272]  Leysia Palen,et al.  Natural Language Processing to the Rescue? Extracting "Situational Awareness" Tweets During Mass Emergency , 2011, ICWSM.

[273]  Ana-Maria Popescu,et al.  Detecting controversial events from twitter , 2010, CIKM.

[274]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[275]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[276]  Theresa Wilson,et al.  Language Identification for Creating Language-Specific Twitter Collections , 2012 .

[277]  Saif Mohammad,et al.  Using Hashtags to Capture Fine Emotion Categories from Tweets , 2015, Comput. Intell..

[278]  Jacob Eisenstein,et al.  Phonological Factors in Social Media Writing , 2013 .

[279]  Ee-Peng Lim,et al.  Comments-oriented blog summarization by sentence extraction , 2007, CIKM '07.

[280]  Adrian Popescu,et al.  Mining User Home Location and Gender from Flickr Tags , 2010, ICWSM.

[281]  Stephen Pulman,et al.  Evaluating the State of the Art , 1995 .

[282]  Elizabeth D. Liddy,et al.  Discerning Emotions in Texts , 2004, AAAI 2004.

[283]  D. Ruths,et al.  What's in a Name? Using First Names as Features for Gender Inference in Twitter , 2013, AAAI Spring Symposium: Analyzing Microtext.

[284]  G. A. Mishne,et al.  Expiriments with mood classification in blog posts , 2005, SIGIR 2005.

[285]  Wei Wu,et al.  Automatic Generation of Personalized Annotation Tags for Twitter Users , 2010, NAACL.

[286]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[287]  Ido Dagan,et al.  Synthesis Lectures on Human Language Technologies , 2009 .

[288]  Kalina Bontcheva,et al.  Where's @wally?: a classification approach to geolocating users based on their social ties , 2013, HT.

[289]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[290]  Liam Peyton,et al.  Personal Health Information Leak Prevention in Heterogeneous Texts , 2009 .

[291]  Philip J. Stone,et al.  The general inquirer: A computer system for content analysis and retrieval based on the sentence as a unit of information , 2007 .

[292]  Victoria Bobicev,et al.  Learning Sentiments from Tweets with Personal Health Information , 2012, Canadian Conference on AI.

[293]  Eric P. Xing,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.

[294]  Mitsuru Ishizuka,et al.  Compositionality Principle in Recognition of Fine-Grained Emotions from Text , 2009, ICWSM.

[295]  Heng Ji,et al.  Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media , 2013, ACL.

[296]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[297]  Enrique Alfonseca,et al.  Description of the Google update summarizer at TAC-2011 , 2011, TAC.

[298]  Vincent Martin,et al.  Predicting the French Stock Market Using Social Media Analysis , 2013, 2013 8th International Workshop on Semantic and Social Media Adaptation and Personalization.

[299]  Ana-Maria Popescu,et al.  Extracting events and event descriptions from Twitter , 2011, WWW.

[300]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[301]  Marc Najork,et al.  Boot-Strapping Language Identifiers for Short Colloquial Postings , 2013, ECML/PKDD.

[302]  Adam Kilgarriff,et al.  Cleaneval: a Competition for Cleaning Web Pages , 2008, LREC.