Detecting new, informative propositions in social media

The ever growing quantity of online text produced makes it increasingly challenging to find new important or useful information. This is especially so when topics of potential interest are not known a-priori, such as in “breaking news stories”. This thesis examines techniques for detecting the emergence of new, interesting information in Social Media. It sets the investigation in the context of a hypothetical knowledge discovery and acquisition system, and addresses two objectives. The first objective addressed is the detection of new topics. The second is filtering of non-informative text from Social Media. A rolling time-slicing approach is proposed for discovery, in which daily frequencies of nouns, named entities, and multiword expressions are compared to their expected daily frequencies, as estimated from previous days using a Poisson model. Trending features, those showing a significant surge in use, in Social Media are potentially interesting. Features that have not shown a similar recent surge in News are selected as indicative of new information. It is demonstrated that surges in nouns and news entities can be detected that predict corresponding surges in mainstream news. Co-occurring trending features are used to create clusters of potentially topic-related documents. Those formed from co-occurrences of named entities are shown to be the most topically coherent. Machine learning based filtering models are proposed for finding informative text in Social Media. News/Non-News and Dialogue Act models are explored using the News annotated Redites corpus of Twitter messages. A simple 5-act Dialogue scheme, used to annotate a small sample thereof, is presented. For both News/Non-News and Informative/Non-Informative classification tasks, using non-lexical message features produces more discriminative and robust classification models than using message terms alone. The combination of all investigated features yield the most accurate models.

[1]  Shigeo Abe,et al.  Support Vector Machines for Pattern Classification (Advances in Pattern Recognition) , 2005 .

[2]  Denilson Barbosa,et al.  Effectiveness and Efficiency of Open Relation Extraction , 2013, EMNLP.

[3]  Mark Sanderson,et al.  Document frequency and term specificity , 2007, RIAO.

[4]  Jane Lin,et al.  Automatic Author Profiling of Online Chat Logs , 2007 .

[5]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[6]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[7]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[8]  Jugal K. Kalita,et al.  Streaming trend detection in Twitter , 2013, Int. J. Web Based Communities.

[9]  Peł Ter Andrał,et al.  The Equivalence of Support Vector Machine and Regularization Neural Networks , 2002 .

[10]  Grzegorz Kondrak,et al.  A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs , 2008, Canadian Conference on AI.

[11]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[12]  John McCarthy,et al.  Artificial Intelligence, Logic and Formalizing Common Sense , 1989 .

[13]  Shafiq R. Joty,et al.  Dialogue Act Recognition in Synchronous and Asynchronous Conversations , 2013, SIGDIAL Conference.

[14]  David Buttler,et al.  Exploring Topic Coherence over Many Models and Many Topics , 2012, EMNLP.

[15]  S. Reeves,et al.  Discourse Analysis , 2018, The Study of Language.

[16]  James Allan,et al.  First story detection in TDT is hard , 2000, CIKM '00.

[17]  Arvid Kappas,et al.  Affect and Social Processes in Online Communication--Experiments with an Affective Dialog System , 2013, IEEE Transactions on Affective Computing.

[18]  Anuja Arora,et al.  A bug Mining tool to identify and analyze security bugs using Naive Bayes and TF-IDF , 2014, 2014 International Conference on Reliability Optimization and Information Technology (ICROIT).

[19]  Paolo Rosso,et al.  Text Categorization and Information Retrieval Using WordNet Senses , 2004 .

[20]  Wanda Pratt,et al.  A new evaluation methodology for literature-based discovery systems , 2009, J. Biomed. Informatics.

[21]  Birte Schmitz,et al.  Dialogue Acts in Automatic Dialogue Interpreting , 1995, TMI.

[22]  Susan T. Dumais,et al.  Newsjunkie: providing personalized newsfeeds via analysis of information novelty , 2004, WWW '04.

[23]  Bernardo A. Huberman,et al.  Trends in Social Media: Persistence and Decay , 2011, ICWSM.

[24]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[25]  Iadh Ounis,et al.  Real-Time Detection, Tracking, and Monitoring of Automatically Discovered Events in Social Media , 2014, ACL.

[26]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[27]  Joakim Nivre,et al.  A Multiword Expression Data Set: Annotating Non-Compositionality and Conventionalization for English Noun Compounds , 2015, MWE@NAACL-HLT.

[28]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[29]  Ziqi Zhang,et al.  Dynamic iterative ontology learning , 2007 .

[30]  Baroni Marco,et al.  Words and Echoes: Assessing and Mitigating the Non-Randomness Problem in Word Frequency Distribution Modeling , 2007, ACL 2007.

[31]  Eric Gilbert,et al.  Widespread Worry and the Stock Market , 2010, ICWSM.

[32]  Paul Thompson,et al.  Name Searching and Information Retrieval , 1997, EMNLP.

[33]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[34]  Bernardo Magnini,et al.  Integrating Subject Field Codes into WordNet , 2000, LREC.

[35]  Vasudeva Varma,et al.  Author Profiling: Predicting Age and Gender from Blogs Notebook for PAN at CLEF 2013 , 2013, CLEF.

[36]  Josef Steinberger,et al.  UWB: Machine Learning Approach to Aspect-Based Sentiment Analysis , 2014, SemEval@COLING.

[37]  Svetoslav Marinov,et al.  Dependency-Based Syntactic Analysis of Bulgarian , 2008 .

[38]  P. McCorduck Machines Who Think , 1979 .

[39]  Hans-Peter Kriegel,et al.  Discovering global and local bursts in a stream of news , 2012, SAC '12.

[40]  Peter McBurney,et al.  Chance Discovery Using Dialectical Argumentation , 2001, JSAI Workshops.

[41]  Richard Sproat,et al.  Knowing the Unseen: Estimating Vocabulary Size over Unseen Samples , 2009, ACL.

[42]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[43]  Ellen Riloff,et al.  Classifying Sentences as Speech Acts in Message Board Posts , 2011, EMNLP.

[44]  David D. Lewis,et al.  Threading Electronic Mail - A Preliminary Study , 1997, Inf. Process. Manag..

[45]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[46]  Ted Kwartler The OpenNLP Project , 2017 .

[47]  Jonathan Foster,et al.  Integrating NLP Tools to Support Information Access to News Archives , 2005 .

[48]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[49]  Yiqun Liu,et al.  Discover breaking events with popular hashtags in twitter , 2012, CIKM.

[50]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[51]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[52]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[53]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[54]  Wanda Pratt,et al.  H.3.3 Information Search and Retrieval , 2022 .

[55]  Hamed Haddadi,et al.  Flash floods and ripples: The spread of media content through the blogosphere , 2009, ICWSM 2009.

[56]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[57]  PolatKemal,et al.  A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems , 2009 .

[58]  Hui Wang,et al.  Identification of Social Acts in Dialogue , 2012, COLING.

[59]  Christopher J. C. Burges,et al.  A machine learning approach for improved BM25 retrieval , 2009, CIKM.

[60]  Jan Alexanderssony,et al.  Dialogue acts in VERBMOBIL-2 , 1997 .

[61]  Roger C. Schank,et al.  The Primitive ACTs of Conceptual Dependency , 1975, TINLAP.

[62]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[63]  Joe Carthy,et al.  Sentence-level event classification in unstructured texts , 2009, Information Retrieval.

[64]  Tanja Urbancic,et al.  Discovering Hidden Knowledge from Biomedical Literature , 2007, Informatica.

[65]  Daniel Jurafsky,et al.  Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? , 2001, EMNLP.

[66]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[67]  Emi Fujioka,et al.  The Role and Identification of Dialog Acts in Online Chat , 2011, Analyzing Microtext.

[68]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[69]  Pablo Gamallo,et al.  Citius: A Naive-Bayes Strategy for Sentiment Analysis on English Tweets , 2014, *SEMEVAL.

[70]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[71]  J. Hintikka On denoting what? , 2005, Synthese.

[72]  Jaime Arguello,et al.  Predicting Speech Acts in MOOC Forum Posts , 2015, ICWSM.

[73]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[74]  Suzanne Stevenson,et al.  Distinguishing Subtypes of Multiword Expressions Using Linguistically-Motivated Statistical Measures , 2007 .

[75]  Arthur C. Graesser,et al.  Automatic Discovery of Speech Act Categories in Educational Games , 2012, EDM.

[76]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[77]  Wei Xu,et al.  End-to-end learning of semantic role labeling using recurrent neural networks , 2015, ACL.

[78]  Kalina Bontcheva,et al.  Text Processing with GATE , 2011 .

[79]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[80]  Yulia Tsvetkov,et al.  Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources , 2014, Computational Linguistics.

[81]  Elena Cabrio,et al.  Semantic Linking for Event-Based Classification of Tweets , 2017 .

[82]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[83]  Padmini Srinivasan,et al.  Topic models and a revisit of text-related applications , 2008, PIKM '08.

[84]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[85]  Mari Ostendorf,et al.  A machine learning approach to reading level assessment , 2009, Comput. Speech Lang..

[86]  Tim Oates,et al.  We’re Not in Kansas Anymore: Detecting Domain Changes in Streams , 2010, EMNLP.

[87]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[88]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[89]  Steffen Staab,et al.  Learning Taxonomic Relations from Heterogeneous Sources of Evidence , 2005 .

[90]  Matthew Hurst,et al.  BlogPulse: Automated Trend Discovery for Weblogs , 2003 .

[91]  Gerhard Weikum,et al.  EnBlogue: emergent topic detection in web 2.0 streams , 2011, SIGMOD '11.

[92]  Carlo Strapparava,et al.  SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[93]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[94]  Joakim Nivre,et al.  Extraction of Nominal Multiword Expressions in French , 2014, MWE@EACL.

[95]  James Allan,et al.  Finding and linking incidents in news , 2007, CIKM '07.

[96]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[97]  Eric P. Xing,et al.  Large-Scale Category Structure Aware Image Categorization , 2011, NIPS.

[98]  Carlos Ramisch,et al.  Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering , 2007, EMNLP.

[99]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[100]  Joel Nothman,et al.  Named Entity Recognition in Wikipedia , 2009, PWNLP@IJCNLP.

[101]  Rohini K. Srihari,et al.  Using Verbs and Adjectives to Automatically Classify Blog Sentiment , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[102]  Erik Cambria,et al.  Sentic patterns: Dependency-based rules for concept-level sentiment analysis , 2014, Knowl. Based Syst..

[103]  Donna K. Harman,et al.  Novelty Detection: The TREC Experience , 2005, HLT.

[104]  Tanja Bekhuis Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy , 2006, Biomedical digital libraries.

[105]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[106]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[107]  Yorick Wilks,et al.  Measuring Text Reuse , 2002, ACL.

[108]  Yifan He,et al.  Idiom Paraphrases: Seventh Heaven vs Cloud Nine , 2015, LSDSem@EMNLP.

[109]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[110]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[111]  John R. Searle,et al.  Speech Acts: An Essay in the Philosophy of Language , 1970 .

[112]  Jürgen Broß,et al.  Automatic construction of domain and aspect specific sentiment lexicons for customer review mining , 2013, CIKM.

[113]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[114]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[115]  Jeff Orkin,et al.  Semi-Automated Dialogue Act Classification for Situated Social Agents in Games , 2010, AGS.

[116]  Hua Xu,et al.  Text-based emotion classification using emotion cause extraction , 2014, Expert Syst. Appl..

[117]  Michael J. Witbrock,et al.  Searching for Common Sense: Populating Cyc™ from the Web , 2005, AAAI.

[118]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[119]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[120]  Stephan Ludwig,et al.  Analyzing Online Reviews Through the Lens of Speech Act Theory: Implications for Consumer Sentiment Analysis , 2016 .

[121]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[122]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[123]  Veronika Vincze,et al.  Multiword Expressions and Named Entities in the Wiki50 Corpus , 2011, RANLP.

[124]  Dick Bulterman Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking , 2000 .

[125]  Stephen E. Robertson,et al.  Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..

[126]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[127]  R. Lewontin ‘The Selfish Gene’ , 1977, Nature.

[128]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[129]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[130]  Robert J. Gaizauskas,et al.  A combined IR/NLP approach to question answering against large text collections , 2000, RIAO.

[131]  Neil R. Smalheiser,et al.  Implicit Text Linkages between Medline Records: Using Arrowsmith as an Aid to Scientific Discovery , 1999, Libr. Trends.

[132]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[133]  Wenjie Li,et al.  Towards Scalable Speech Act Recognition in Twitter: Tackling Insufficient Training Data , 2012 .

[134]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[135]  Carlos Ramisch,et al.  Never-Ending Multiword Expressions Learning , 2015, MWE@NAACL-HLT.

[136]  Vivek Narayanan,et al.  Fast and Accurate Sentiment Classification Using an Enhanced Naive Bayes Model , 2013, IDEAL.

[137]  E. Prince The ZPG Letter: Subjects, Definiteness, and Information-status , 1992 .

[138]  Yorick Wilks,et al.  Automatic Dating of Documents and Temporal Text Classification , 2006 .

[139]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[140]  Luis Alfonso Ureña López,et al.  Experiments with SVM to classify opinions in different domains , 2011, Expert Syst. Appl..

[141]  B. Ahmad,et al.  POLITICAL MINER: OPINION EXTRACTION FROM USER GENERATED POLITICAL REVIEWS , 2014 .

[142]  Ajantha S. Atukorale,et al.  A robust algorithm for determining the newsworthiness of microblogs , 2015, 2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer).

[143]  Patrick Watrin,et al.  An N-gram Frequency Database Reference to Handle MWE Extraction in NLP Applications , 2011, MWE@ACL.

[144]  Craig MacDonald,et al.  Can Twitter Replace Newswire for Breaking News? , 2013, ICWSM.

[145]  Yorick Wilks,et al.  What is Lexical Tuning? , 2002, J. Semant..

[146]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[147]  Oren Etzioni,et al.  An analysis of open information extraction based on semantic role labeling , 2011, K-CAP '11.

[148]  Genshe Chen,et al.  Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier , 2013, 2013 IEEE International Conference on Big Data.

[149]  Arthur C. Graesser,et al.  Automated Speech Act Classification For Online Chat , 2011, MAICS.

[150]  Kristina Lerman,et al.  Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks , 2010, ICWSM.

[151]  Steve Pepper,et al.  Navigating Haystacks and Discovering Needles: Introducing the New Topic Map Standard , 1999, Markup Lang..

[152]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[153]  Radu Soricut,et al.  Unsupervised Morphology Induction Using Word Embeddings , 2015, NAACL.

[154]  Nir Ailon,et al.  Streaming k-means approximation , 2009, NIPS.

[155]  Il-Chul Moon,et al.  Efficient extraction of domain specific sentiment lexicon with active learning , 2015, Pattern Recognit. Lett..

[156]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[157]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[158]  Paul H. Garthwaite,et al.  A Bayesian Mixture Model for Term Re-occurrence and Burstiness , 2005, CoNLL.

[159]  Alexander J. Smola,et al.  Online Inference for the Infinite Topic-Cluster Model: Storylines from Streaming Text , 2011, AISTATS.

[160]  Kalina Bontcheva,et al.  TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text , 2013, RANLP.

[161]  Lada A. Adamic,et al.  Memes Online: Extracted, Subtracted, Injected, and Recollected , 2011, ICWSM.

[162]  Dunja Mladenic,et al.  Semi-automatic Construction of Topic Ontologies , 2005, EWMF/KDO.

[163]  Akshay Java,et al.  The ICWSM 2009 Spinn3r Dataset , 2009 .

[164]  Sean A. Munson,et al.  The Prevalence of Political Discourse in Non-Political Blogs , 2011, ICWSM.

[165]  Jinmao Wei,et al.  Rough set based decision tree , 2002, Proceedings of the 4th World Congress on Intelligent Control and Automation (Cat. No.02EX527).

[166]  Renata Vieira,et al.  A Corpus-based Investigation of Definite Description Use , 1997, CL.

[167]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[168]  Veronika Vincze,et al.  VPCTagger: Detecting Verb-Particle Constructions With Syntax-Based Methods , 2014, MWE@EACL.

[169]  Federico Sangati,et al.  Multiword Expression Identification with Recurring Tree Fragments and Association Measures , 2015, MWE@NAACL-HLT.

[170]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[171]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[172]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[173]  Milan Dojchinovski,et al.  Datasets, GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems , 2013, NLP-DBPEDIA@ISWC.

[174]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[175]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[176]  Kenneth Ward Church Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p2 , 2000, COLING.

[177]  Filipe de Sá Mesquita Clustering techniques for open relation extraction , 2012, PhD '12.

[178]  Graeme Hirst,et al.  Building a Lexicon of Formulaic Language for Language Learners , 2015, MWE@NAACL-HLT.

[179]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[180]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[181]  Andrea Esuli,et al.  Determining Term Subjectivity and Term Orientation for Opinion Mining , 2006, EACL.

[182]  Prasenjit Mitra,et al.  Temporal and Information Flow Based Event Detection from Social Text Streams , 2007, AAAI.

[183]  Gregoris Mentzas,et al.  Using Social Media to Predict Future Events with Agent-Based Markets , 2010, IEEE Intelligent Systems.

[184]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[185]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[186]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[187]  Kamal Sarkar,et al.  A Machine Learning Approach for the Identification of Bengali Noun-Noun Compound Multiword Expressions , 2014, ArXiv.

[188]  Michael D. Gordon,et al.  Literature-based discovery by lexical statistics , 1999 .

[189]  Nigel Dewdney Named Entity Trends Originating from Social Media , 2012 .

[190]  Erik Cambria,et al.  SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis , 2014, AAAI.

[191]  Veronika Vincze,et al.  Learning to detect english and hungarian light verb constructions , 2013, TSLP.

[192]  Namita Mittal,et al.  Sentiment Analysis Using Common-Sense and Context Information , 2015, Comput. Intell. Neurosci..

[193]  David R. Traum,et al.  20 Questions on Dialogue Act Taxonomies , 2000, J. Semant..

[194]  Deborah Cameron Style policy and style politics: a neglected aspect of the language of the news , 1996 .

[195]  Robert J. Gaizauskas,et al.  Using Coreference Chains for Text Summarization , 1999, COREF@ACL.

[196]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[197]  Rachel Cotterill,et al.  Just the Facts: Winnowing Microblogs for Newsworthy Statements using Non-Lexical Features , 2017, CICLing.

[198]  Bernardo A. Huberman,et al.  The Pulse of News in Social Media: Forecasting Popularity , 2012, ICWSM.

[199]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[200]  Preslav Nakov,et al.  Hunting for Troll Comments in News Community Forums , 2016, ACL.

[201]  P. Waila,et al.  Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification , 2013, 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s).