CWI: A multimodal deep learning approach for named entity recognition from social media using character, word and image features

Named Entity Recognition (NER) from social media posts is a challenging task. User generated content that forms the nature of social media, is noisy and contains grammatical and linguistic errors. This noisy content makes it much harder for tasks such as named entity recognition. We propose two novel deep learning approaches utilizing multimodal deep learning and Transformers. Both of our approaches use image features from short social media posts to provide better results on the NER task. On the first approach, we extract image features using InceptionV3 and use fusion to combine textual and image features. This presents more reliable name entity recognition when the images related to the entities are provided by the user. On the second approach, we use image features combined with text and feed it into a BERT like Transformer. The experimental results, namely, the precision, recall and F1 score metrics show the superiority of our work compared to other state-of-the-art NER solutions.

[1]  Xuanjing Huang,et al.  Adaptive Co-attention Network for Named Entity Recognition in Tweets , 2018, AAAI.

[2]  Raja Bala,et al.  Deep Temporal Multimodal Fusion for Medical Procedure Monitoring Using Wearable Sensors , 2018, IEEE Transactions on Multimedia.

[3]  Chenliang Li,et al.  A Survey on Deep Learning for Named Entity Recognition , 2018, IEEE Transactions on Knowledge and Data Engineering.

[4]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[5]  Jian Sun,et al.  Multimodal 2D+3D Facial Expression Recognition With Deep Fusion Convolutional Neural Network , 2017, IEEE Transactions on Multimedia.

[6]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[7]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[8]  Yoshua Bengio,et al.  Fine-grained attention mechanism for neural machine translation , 2018, Neurocomputing.

[9]  Seong-Whan Lee,et al.  Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis , 2014, NeuroImage.

[10]  Zhiyong Lu,et al.  SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedical Text , 2015, IEEE Journal of Biomedical and Health Informatics.

[11]  Li Zhang,et al.  Deep similarity learning for multimodal medical images , 2018, Comput. methods Biomech. Biomed. Eng. Imaging Vis..

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Ning Xu,et al.  Learn to Combine Modalities in Multimodal Deep Learning , 2018, ArXiv.

[15]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Shih-Fu Chang,et al.  Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification , 2017, IEEE Transactions on Multimedia.

[18]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[19]  Andrew McCallum,et al.  Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.

[20]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[21]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[22]  Wenwu Zhu,et al.  Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure , 2015, IEEE Transactions on Multimedia.

[23]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[24]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Leonardo Neves,et al.  Multimodal Named Entity Recognition for Short Social Media Posts , 2018, NAACL.

[26]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[27]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[28]  Wagner Meira,et al.  Named Entity Disambiguation in Streaming Data , 2012, ACL.

[29]  Przemyslaw Biecek,et al.  Named Entity Recognition - Is there a glass ceiling? , 2019, CoNLL.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Shihui Ying,et al.  Multimodal Neuroimaging Feature Learning With Multimodal Stacked Deep Polynomial Networks for Diagnosis of Alzheimer's Disease , 2018, IEEE Journal of Biomedical and Health Informatics.

[32]  Jianfeng Gao,et al.  Deep Learning Based Text Classification: A Comprehensive Review , 2020, ArXiv.

[33]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[36]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[37]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[38]  Andrew McCallum,et al.  Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[39]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[40]  Iryna Gurevych,et al.  Multimodal Grounding for Language Processing , 2018, COLING.

[41]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[42]  Chong-Wah Ngo,et al.  Deep Multimodal Learning for Affective Analysis and Retrieval , 2015, IEEE Transactions on Multimedia.

[43]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[44]  Wei Liu,et al.  Emotion Recognition Using Multimodal Deep Learning , 2016, ICONIP.

[45]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[46]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[47]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[48]  Brian Locke Named Entity Recognition : Adapting to Microblogging , 2009 .

[49]  Thamar Solorio,et al.  A Multi-task Approach for Named Entity Recognition in Social Media Data , 2017, NUT@EMNLP.

[50]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[51]  Qi He,et al.  Tweet Segmentation and Its Application to Named Entity Recognition , 2015, IEEE Transactions on Knowledge and Data Engineering.

[52]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[53]  Ting Wang,et al.  An effective approach to tweets opinion retrieval , 2015, World Wide Web.

[54]  Graham W. Taylor,et al.  Deep Multimodal Learning: A Survey on Recent Advances and Trends , 2017, IEEE Signal Processing Magazine.

[55]  Kenli Li,et al.  Hadoop Recognition of Biomedical Named Entity Using Conditional Random Fields , 2015, IEEE Transactions on Parallel and Distributed Systems.

[56]  Dacheng Tao,et al.  Robust Face Recognition via Multimodal Deep Face Representation , 2015, IEEE Transactions on Multimedia.

[57]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[58]  Joao Tiago Luis Santos,et al.  Named Entity Disambiguation over Texts Written in the Portuguese or Spanish Languages , 2015, IEEE Latin America Transactions.

[59]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[60]  Jens Lehmann,et al.  Named Entity Recognition in Twitter Using Images and Text , 2017, ICWE Workshops.

[61]  Yue Gao,et al.  Predicting Microblog Sentiments via Weakly Supervised Multimodal Deep Learning , 2018, IEEE Transactions on Multimedia.

[62]  Ling Shao,et al.  Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Erik F. Tjong Kim Sang,et al.  Representing Text Chunks , 1999, EACL.

[64]  Timothy Baldwin,et al.  Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition , 2015, NUT@IJCNLP.

[65]  Leonardo Neves,et al.  Multimodal Named Entity Disambiguation for Noisy Social Media Posts , 2018, ACL.

[66]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[67]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).