Revisiting and Advancing Chinese Natural Language Understanding with Accelerated Heterogeneous Knowledge Pre-training

Recently, knowledge-enhanced pre-trained language models (KEPLMs) improve context-aware representations via learning from structured relations in knowledge graphs, and/or linguistic knowledge from syntactic or dependency analysis. Unlike English, there is a lack of high-performing open-source Chinese KEPLMs in the natural language processing (NLP) community to support various language understanding applications. In this paper, we revisit and advance the development of Chinese natural language understanding with a series of novel Chinese KEPLMs released in various parameter sizes, namely CKBERT (Chinese knowledge-enhanced BERT).Specifically, both relational and linguistic knowledge is effectively injected into CKBERT based on two novel pre-training tasks, i.e., linguistic-aware masked language modeling and contrastive multi-hop relation modeling. Based on the above two pre-training paradigms and our in-house implemented TorchAccelerator, we have pre-trained base (110M), large (345M) and huge (1.3B) versions of CKBERT efficiently on GPU clusters. Experiments demonstrate that CKBERT outperforms strong baselines for Chinese over various benchmark NLP tasks and in terms of different model sizes.

[1]  Minghui Qiu,et al.  EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing , 2022, EMNLP.

[2]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[3]  Ting Liu,et al.  PERT: Pre-training BERT with Permuted Language Model , 2022, ArXiv.

[4]  Juan Manuel Pérez,et al.  RoBERTuito: a pre-trained language model for social media text in Spanish , 2021, LREC.

[5]  Shijin Wang,et al.  Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network , 2022, ACL.

[6]  Jian Yang,et al.  A Survey of Knowledge Enhanced Pre-trained Models , 2021, ArXiv.

[7]  Zhuowen Tu,et al.  Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models , 2021, ACL.

[8]  Frank Rudzicz,et al.  How is BERT surprised? Layerwise detection of linguistic anomalies , 2021, ACL.

[9]  Songfang Huang,et al.  Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models , 2021, NAACL.

[10]  Alex Suhan,et al.  LazyTensor: combining eager execution with domain-specific compilers , 2021, ArXiv.

[11]  Catherine Havasi,et al.  Combining pre-trained language models and structured knowledge , 2021, ArXiv.

[12]  Hyunjae Lee,et al.  KoreALBERT: Pretraining a Lite BERT Model for Korean Language Understanding , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).

[13]  Nan Duan,et al.  Syntax-Enhanced Pre-trained Model , 2020, ACL.

[14]  Huanqi Cao,et al.  CPM: A Large-scale Generative Chinese Pre-trained Language Model , 2020, AI Open.

[15]  Yue Zhang,et al.  On Commonsense Cues in BERT for Solving Commonsense Tasks , 2020, FINDINGS.

[16]  Li Dong,et al.  Self-Attention Attribution: Interpreting Information Interactions Inside Transformer , 2020, AAAI.

[17]  Zhiyuan Liu,et al.  KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation , 2019, Transactions of the Association for Computational Linguistics.

[18]  James Caverlee,et al.  Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition , 2020, EMNLP.

[19]  Zheng Zhang,et al.  CoLAKE: Contextualized Language and Knowledge Embedding , 2020, COLING.

[20]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[21]  Dian Yu,et al.  CLUE: A Chinese Language Understanding Evaluation Benchmark , 2020, COLING.

[22]  Wanxiang Che,et al.  Revisiting Pre-Trained Models for Chinese Natural Language Processing , 2020, FINDINGS.

[23]  Fabio Petroni,et al.  How Context Affects Language Models' Factual Predictions , 2020, AKBC.

[24]  Hai Zhao,et al.  LIMIT-BERT : Linguistic Informed Multi-Task BERT , 2019, ArXiv.

[25]  Zhe Zhao,et al.  K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.

[26]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[27]  Yingming Li,et al.  Fine-tune BERT with Sparse Self-Attention Mechanism , 2019, EMNLP.

[28]  Noah A. Smith,et al.  Knowledge Enhanced Contextual Word Representations , 2019, EMNLP.

[29]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[30]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[31]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[32]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[33]  Yu Sun,et al.  ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.

[34]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[35]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[38]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[39]  Graham Neubig,et al.  Neural Lattice Language Models , 2018, TACL.

[40]  Yue Zhang,et al.  Neural Reranking for Named Entity Recognition , 2017, RANLP.

[41]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[42]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[43]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[44]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[45]  Nanyun Peng,et al.  Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings , 2015, EMNLP.

[46]  Gina-Anne Levow,et al.  The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.