UAMNer: uncertainty-aware multimodal named entity recognition in social media posts

Named Entity Recognition (NER) on social media is a challenging task, as social media posts are usually short and noisy. Recently, some work explores different ways to incorporate the visual information from the image to improve NER on social media and achieves great success. However, existing methods ignore a common scenario on social media—the image sometimes does not match the posted text. Thus, the irrelevant images may introduce noisy information in existing models. In this paper, a novel uncertainty-aware framework for multimodal NER (UAMNer) on social media is put forward, which combines visual features with text when the text information is insufficient, thus suppressing noisy information from the irrelevant images. Specifically, we propose a two-stage label refinement framework for multimodal NER in social media posts. Given a multimodal post, we first use a bayesian neural network to produce candidate labels from the text. If the candidate labels have high uncertainty, we then use a multimodal transformer to refine the label with textual and visual features. We experiment on two public datasets, namely Twitter-2015 and Twitter-2017. The proposed method achieves better performance compared with the state-of-the-art methods.

[1]  Mahmoud Al-Ayyoub,et al.  Deep learning for Arabic NLP: A survey , 2017, J. Comput. Sci..

[2]  Qi Zhang,et al.  Leveraging Document-Level Label Consistency for Named Entity Recognition , 2020, IJCAI.

[3]  Qi He,et al.  Tweet Segmentation and Its Application to Named Entity Recognition , 2015, IEEE Transactions on Knowledge and Data Engineering.

[4]  Clinton Fookes,et al.  Bayesian Neural Networks: An Introduction and Survey , 2020, Case Studies in Applied Bayesian Data Science.

[5]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.

[6]  Jianfei Yu,et al.  Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer , 2020, ACL.

[7]  Ho-fung Leung,et al.  Multimodal Representation with Embedded Visual Guiding Objects for Named Entity Recognition in Social Media Posts , 2020, ACM Multimedia.

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Yiannis Kompatsiaris,et al.  Deep Learning Advances in Computer Vision with 3D Data , 2017, ACM Comput. Surv..

[10]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[11]  Hongfei Lin,et al.  Disease named entity recognition from biomedical literature using a novel convolutional neural network , 2017, BMC Medical Genomics.

[12]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Minyi Guo,et al.  Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation , 2019, WWW.

[14]  Qingming Huang,et al.  Topic detection in cross-media: a semi-supervised co-clustering approach , 2014, International Journal of Multimedia Information Retrieval.

[15]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[16]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[17]  Heng Ji,et al.  Cross-media Structured Common Space for Multimedia Event Extraction , 2020, ACL.

[18]  Tao Wang,et al.  Object-Aware Multimodal Named Entity Recognition in Social Media Posts With Adversarial Learning , 2021, IEEE Transactions on Multimedia.

[19]  Peng Jiang,et al.  Multi-Source Pointer Network for Product Title Summarization , 2018, CIKM.

[20]  Hermann Ney,et al.  Maximum Entropy Models for Named Entity Recognition , 2003, CoNLL.

[21]  Davide Scaramuzza,et al.  A General Framework for Uncertainty Estimation in Deep Learning , 2020, IEEE Robotics and Automation Letters.