Análise de Métodos e Ferramentas para Reconhecimento de Palavras Relevantes em Microblogs [Analysis of Methods and Tools for Relevant Words Recognition in Microblogs]

Extracting accurate information from the huge volumes of data, much of them unstructured, generated in social media is currently a big challenge. However, it has several relevant applications, some of them latent yet. One of the first and most decisive steps in this information extraction process is the recognition of relevant words in texts. This article presents a comparative study of methods and tools for recognizing relevant words on microblog posts. Among several analyzed tools, five have been selected for experments with 100,000 tweets. These experiments showed high variability of the results generated by different tools, suggesting a need for improvements

[1]  Lluís Padró,et al.  A Hybrid Environment for Syntax-Semantic Tagging , 1998, ArXiv.

[2]  Kalina Bontcheva,et al.  TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text , 2013, RANLP.

[3]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[4]  Saima Jabeen,et al.  Named entity recognition and normalization in tweets towards text summarization , 2013, Eighth International Conference on Digital Information Management (ICDIM 2013).

[5]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[6]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[7]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[8]  Thomas G. Dietterich The Handbook of Brain Theory and Neural Networks , 2002 .

[9]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[10]  Doug Downey,et al.  Locating Complex Named Entities in Web Text , 2007, IJCAI.

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  H. Cunningham,et al.  GATE : A Unicode-based Infrastructure Supporting Multilingual Information Extraction , 2003 .

[13]  Maurice van Keulen,et al.  Information Extraction for Social Media , 2014, SWAIE@COLING.

[14]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[15]  Weixu Wang,et al.  Spatiotemporal and semantic information extraction from Web news reports about natural hazards , 2015, Comput. Environ. Urban Syst..

[16]  Nikos Pelekis,et al.  The Baquara2 knowledge-based framework for semantic enrichment and analysis of movement data , 2015, Data Knowl. Eng..

[17]  Renata Vieira,et al.  Comparative Analysis of Portuguese Named Entities Recognition Tools , 2014, LREC.

[18]  D. P. Acharjya,et al.  Opinion mining about a product by analyzing public tweets in Twitter , 2014, 2014 International Conference on Computer Communication and Informatics.

[19]  Larry L. Peterson,et al.  Reasoning about naming systems , 1993, TOPL.

[20]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[21]  Raphaël Troncy,et al.  NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud , 2012, LDOW.

[22]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[23]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[24]  Ian Witten,et al.  Data Mining , 2000 .

[25]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[26]  Sören Auer,et al.  AGDISTIS - Agnostic Disambiguation of Named Entities Using Linked Open Data , 2014, ECAI.

[27]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[28]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[29]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[30]  Kalina Bontcheva,et al.  Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.

[31]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[32]  Sören Auer,et al.  AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data , 2014, International Semantic Web Conference.

[33]  Atro Voutilainen Part-of-Speech Tagging , 2005 .

[34]  Jun Hu,et al.  What Is New in Our City? A Framework for Event Extraction Using Social Media Posts , 2015, PAKDD.

[35]  Raphaël Troncy,et al.  NERD: evaluating named entity recognition tools in the web of data , 2011 .

[36]  Robert P. Cook,et al.  Freebase: A Shared Database of Structured General Human Knowledge , 2007, AAAI.

[37]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[38]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[39]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[40]  Gavin Brown,et al.  Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[41]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[42]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[43]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[44]  António Branco,et al.  A Suite of Shallow Processing Tools for Portuguese: LX-Suite , 2006, EACL.

[45]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[46]  Renato Fileto,et al.  Automatically Tailoring Semantics-Enabled Dimensions for Movement Data Warehouses , 2015, DaWaK.

[47]  Kiev Gama,et al.  Uma Arquitetura de Referência para Plataforma de Crowdsensing em Smart Cities , 2015 .

[48]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[49]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[50]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[51]  B. Lev,et al.  Fundamental Information Analysis , 1993 .

[52]  D. Willingham Cognition: The Thinking Animal , 2000 .

[53]  Krishnaprasad Thirunarayan,et al.  Extracting City Traffic Events from Social Streams , 2015, ACM Trans. Intell. Syst. Technol..

[54]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[55]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 shared task , 2003 .

[56]  Axel-Cyrille Ngonga Ngomo,et al.  Named Entity Recognition using FOX , 2014, International Semantic Web Conference.