Implementation of GA-Based Feature Selection in the Classification and Mapping of Disaster-Related Tweets

The extracted features from Twitter messages were transformed into feature vector matrix for which feature selection using an improved Genetic Algorithm was applied. The features selected were used to train and test the classifiers. The evaluation showed the effectiveness of the implemented feature selection method in the dimensionality reduction of the feature space and in increasing the accuracy of Multinomial Naive Bayes. Moreover, a web-based prototype utilizing the model was developed and was used to analyze tweet data pertaining to natural disasters in the Philippines. The prototype exhibited potential to harness the capability of social media as a tool in helping the affected community in times of natural crisis. This work may spark ideas for a more advanced development of IT-based disaster management applications.

[1]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[2]  Carlos Castillo,et al.  What to Expect When the Unexpected Happens: Social Media Communications Across Crises , 2015, CSCW.

[3]  Nivet Chirawichitchai Sentiment classification by a hybrid method of greedy search and multinomial naïve bayes algorithm , 2013, 2013 Eleventh International Conference on ICT and Knowledge Engineering.

[4]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[5]  Mohammad-Reza Feizi-Derakhshi,et al.  Classifying Different Feature Selection Algorithms Based on the Search Strategies , .

[6]  Edson C. Tandoc,et al.  Communicating on Twitter during a disaster: An analysis of tweets during Typhoon Haiyan in the Philippines , 2015, Comput. Hum. Behav..

[7]  Martin Jung,et al.  A Guided Hybrid Genetic Algorithm for Feature Selection with Expensive Cost Functions , 2013, ICCS.

[8]  Vincenzo Bollettino,et al.  Resilience and Disaster Trends in the Philippines: Opportunities for National and Local Capacity Building , 2016, PLoS currents.

[9]  Pradnya Kumbhar,et al.  A Survey on Feature Selection Techniques and Classification Algorithms for Efficient Text Classification , 2016 .

[10]  Haji Mohammad Saleem,et al.  Effects of Disaster Characteristics on Twitter Event Signature , 2014 .

[11]  Ariel M. Sison,et al.  An improved genetic algorithm for feature selection in the classification of Disaster-related Twitter messages , 2018, 2018 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE).

[12]  Muhammad Imran,et al.  Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages , 2016, LREC.

[13]  Licheng Jiao,et al.  Multi-population Genetic Algorithm for Feature Selection , 2006, ICNC.

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  Ralph Vincent,et al.  FILIET: An Information Extraction System For Filipino Disaster-Related Tweets , 2015 .

[16]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[17]  Hong-Won Yun,et al.  Disaster Events Detection using Twitter Data , 2011, J. Inform. and Commun. Convergence Engineering.

[18]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[19]  Asha Gowda Karegowda,et al.  Feature Subset Selection Problem using Wrapper Approach in Supervised Learning , 2010 .

[20]  Weiru Liu,et al.  A survey of location inference techniques on Twitter , 2015, J. Inf. Sci..

[21]  S. Satyanarayana,et al.  An algorithm for identification of natural disaster affected area , 2017, Journal of Big Data.

[22]  Labiba Souici-Meslati,et al.  Hybrid ACO-PSO Based Approaches for Feature Selection , 2016 .

[23]  Seonhwa Choi,et al.  The Real-Time Monitoring System of Social Big Data for Disaster Management , 2015 .

[24]  Zachary C. Steinert-Threlkeld Twitter as Data , 2018 .

[25]  Fernando Diaz,et al.  CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises , 2014, ICWSM.

[26]  Randy Joy Magno Ventayen Classification of Local Language Disaster Related Tweets in Micro Blogs , 2018 .

[27]  M. Natarajan Role of Text Mining in Information Extraction and Information Management , 2005 .

[28]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[29]  Diego Vergara,et al.  Multinomial Naive Bayes for real-time gender recognition , 2016, 2016 XXI Symposium on Signal Processing, Images and Artificial Vision (STSIVA).

[30]  Randy Joy Magno Ventayen MULTILINGUAL DETECTION AND MAPPING OF EMERGENCY AND DISASTER-RELATED TWEETS , 2017 .

[31]  Cat Graham,et al.  A guide to social media emergency management analytics: Understanding its place through Typhoon Haiyan tweets , 2015 .

[32]  Bo Tang,et al.  Toward Optimal Feature Selection in Naive Bayes for Text Categorization , 2016, IEEE Transactions on Knowledge and Data Engineering.

[33]  John Yen,et al.  Classifying text messages for the haiti earthquake , 2011, ISCRAM.