Monitoring Social Media to Identify Environmental Crimes through NLP. A preliminary study

This paper presents the results of research carried out on the UNIOR Eye corpus, a corpus which has been built by downloading tweets related to environmental crimes. The corpus is made up of 228,412 tweets organized into four different subsections, each one concerning a specific environmental crime. For the current study we focused on the subsection of waste crimes, composed of 86,206 tweets which were tagged according to the two labels alert and no alert. The aim is to build a model able to detect which class a tweet belongs to.

[1]  Leysia Palen,et al.  Identifying and Categorizing Disaster-Related Tweets , 2016, SocialNLP@EMNLP.

[2]  Martha Palmer,et al.  Twitter in Mass Emergency: What NLP Can Contribute , 2010, HLT-NAACL 2010.

[3]  Kalina Bontcheva,et al.  Adapting SVM for data sparseness and imbalance: a case study in information extraction , 2009, Natural Language Engineering.

[4]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[5]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[6]  Muhammad Imran,et al.  Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages , 2016, LREC.

[7]  Pasquale Peluso Dalla terra dei fuochi alle terre avvelenate: lo smaltimento illecito dei rifiuti in Italia , 2015 .

[8]  Graham Neubig,et al.  Safety Information Mining — What can NLP do in a disaster— , 2011, IJCNLP.

[9]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[10]  K. Krippendorff Reliability in Content Analysis: Some Common Misconceptions and Recommendations , 2004 .

[11]  Sarah Vieweg,et al.  Processing Social Media Messages in Mass Emergency , 2014, ACM Comput. Surv..

[12]  Alessio Bosca,et al.  The role of unstructured data in real-time disaster-related social media monitoring , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[13]  David Karol,et al.  PARTY POLARIZATION ON ENVIRONMENTAL ISSUES : Toward Prospects for Change , 2018 .

[14]  Darwin Alulema,et al.  System for monitoring natural disasters using natural language processing in the social network Twitter , 2016, 2016 IEEE International Carnahan Conference on Security Technology (ICCST).

[15]  Ibrahim Demir,et al.  Identifying disaster-related tweets and their semantic, spatial and temporal context using deep learning, natural language processing and spatial analysis: a case study of Hurricane Irma , 2019, Int. J. Digit. Earth.

[16]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.