Graph Based Semi-supervised Learning with Convolution Neural Networks to Classify Crisis Related Tweets

During time-critical situations such as natural disasters, rapid classification of data posted on social networks by affected people is useful for humanitarian organizations to gain situational awareness and to plan response efforts. However, the scarcity of labeled data in the early hours of a crisis hinders machine learning tasks thus delays crisis response. In this work, we propose to use an inductive semi-supervised technique to utilize unlabeled data, which is often abundant at the onset of a crisis event, along with fewer labeled data. Specif- ically, we adopt a graph-based deep learning framework to learn an inductive semi-supervised model. We use two real-world crisis datasets from Twitter to evaluate the proposed approach. Our results show significant improvements using unlabeled data as compared to only using labeled data.

[1]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.

[2]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[3]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[4]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[5]  Hassan Sajjad,et al.  Robust Classification of Crisis-Related Data on Social Networks Using Convolutional Neural Networks , 2017, ICWSM.

[6]  Muhammad Imran,et al.  Integrating Social Media Communications into the Rapid Assessment of Sudden Onset Disasters , 2014, SocInfo.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[9]  Fei Wang,et al.  Graph-based semi-supervised learning , 2009, Artificial Life and Robotics.

[10]  George Michailidis,et al.  Graph-Based Semisupervised Learning , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[13]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[14]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[15]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[16]  Tom Michael Mitchell,et al.  The Role of Unlabeled Data in Supervised Learning , 2004 .

[17]  Carlos Castillo,et al.  AIDR: artificial intelligence for disaster response , 2014, WWW.

[18]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[19]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[22]  Sarah Vieweg,et al.  Processing Social Media Messages in Mass Emergency , 2014, ACM Comput. Surv..

[23]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[24]  Tong Zhang,et al.  Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding , 2015, NIPS.

[25]  Shafiq R. Joty,et al.  Regularized and Retrofitted models for Learning Sentence Representation with Context , 2017, CIKM.

[26]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[27]  Jong-Hoon Oh,et al.  Aid is Out There: Looking for Help from Tweets during a Large Scale Disaster , 2013, ACL.