A New Method of Improving BERT for Text Classification

Text classification is a basic task in natural language processing. Recently, pre-training models such as BERT have achieved outstanding results compared with previous methods. However, BERT fails to take into account local information in the text such as a sentence and a phrase. In this paper, we present a BERT-CNN model for text classification. By adding CNN to the task-specific layers of BERT model, our model can get the information of important fragments in the text. In addition, we input the local representation along with the output of the BERT into the transformer encoder in order to take advantage of the self-attention mechanism and finally get the representation of the whole text through transformer layer. Extensive experiments demonstrate that our model obtains competitive performance against state-of-the-art baselines on four benchmark datasets.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[3]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[4]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[5]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[6]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[7]  Yaohui Jin,et al.  Learning What to Share: Leaky Multi-Task Network for Text Classification , 2018, COLING.

[8]  Xiaoyong Du,et al.  Initializing Convolutional Filters with Semantic Features for Text Classification , 2017, EMNLP.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Wenqing Chen,et al.  Gated Multi-Task Network for Text Classification , 2018, NAACL.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[15]  Tong Zhang,et al.  Deep Pyramid Convolutional Neural Networks for Text Categorization , 2017, ACL.

[16]  Baoxin Wang,et al.  Disconnected Recurrent Neural Networks for Text Categorization , 2018, ACL.

[17]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[18]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[19]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[20]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[21]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[22]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[23]  Zhidong Deng,et al.  Densely Connected CNN with Multi-scale Feature Attention for Text Classification , 2018, IJCAI.

[24]  Yi Liu,et al.  A Multi-sentiment-resource Enhanced Attention Network for Sentiment Classification , 2018, ACL.

[25]  Xiaoyan Zhu,et al.  Encoding Syntactic Knowledge in Neural Networks for Sentiment Classification , 2017, ACM Trans. Inf. Syst..

[26]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.