Evolving Character-level Convolutional Neural Networks for Text Classification

Character-level convolutional neural networks (char-CNN) require no knowledge of the semantic or syntactic structure of the language they classify. This property simplifies its implementation but reduces its classification accuracy. Increasing the depth of char-CNN architectures does not result in breakthrough accuracy improvements. Research has not established which char-CNN architectures are optimal for text classification tasks. Manually designing and training char-CNNs is an iterative and time-consuming process that requires expert domain knowledge. Evolutionary deep learning (EDL) techniques, including surrogate-based versions, have demonstrated success in automatically searching for performant CNN architectures for image analysis tasks. Researchers have not applied EDL techniques to search the architecture space of char-CNNs for text classification tasks. This article demonstrates the first work in evolving char-CNN architectures using a novel EDL algorithm based on genetic programming, an indirect encoding and surrogate models, to search for performant char-CNN architectures automatically. The algorithm is evaluated on eight text classification datasets and benchmarked against five manually designed CNN architecture and one long short-term memory (LSTM) architecture. Experiment results indicate that the algorithm can evolve architectures that outperform the LSTM in terms of classification accuracy and five of the manually designed CNN architectures in terms of classification accuracy and parameter count.

[1]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[2]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[4]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[5]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[6]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[7]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[8]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[9]  Tong Zhang,et al.  Convolutional Neural Networks for Text Categorization: Shallow Word-level vs. Deep Character-level , 2016, ArXiv.

[10]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[11]  Elliot Meyerson,et al.  Evolutionary neural AutoML for deep learning , 2019, GECCO.

[12]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[13]  Alexandre Denis,et al.  Do Convolutional Networks need to be Deep for Text Classification ? , 2017, AAAI Workshops.

[14]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[15]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[16]  Hayaru Shouno,et al.  Analysis of function of rectified linear unit used in deep learning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[17]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Tong Zhang,et al.  Deep Pyramid Convolutional Neural Networks for Text Categorization , 2017, ACL.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[21]  Mengjie Zhang,et al.  Evolving Deep Convolutional Neural Networks for Image Classification , 2017, IEEE Transactions on Evolutionary Computation.

[22]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[23]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[24]  Kunle Olukotun,et al.  Understanding and optimizing asynchronous low-precision stochastic gradient descent , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[25]  Mengjie Zhang,et al.  A Particle Swarm Optimization-Based Flexible Convolutional Autoencoder for Image Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  Bin Wang,et al.  Evolving Deep Convolutional Neural Networks by Variable-Length Particle Swarm Optimization for Image Classification , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Xianguo Zhang PAC-Learning for Energy-based Models , 2022 .

[32]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[33]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[34]  Andrew Y. Ng,et al.  CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.