Continual BERT: Continual Learning for Adaptive Extractive Summarization of COVID-19 Literature

The scientific community continues to publish an overwhelming amount of new research related to COVID-19 on a daily basis, leading to much literature without little to no attention. To aid the community in understanding the rapidly flowing array of COVID-19 literature, we propose a novel BERT architecture that provides a brief yet original summarization of lengthy papers. The model continually learns on new data in online fashion while minimizing catastrophic forgetting, thus fitting to the need of the community. Benchmark and manual examination of its performance show that the model provide a sound summary of new scientific literature.

[1]  Mirella Lapata,et al.  Neural Latent Extractive Document Summarization , 2018, EMNLP.

[2]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[3]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[4]  Tiejun Zhao,et al.  Neural Document Summarization by Jointly Learning to Score and Select Sentences , 2018, ACL.

[5]  Mark Bathe,et al.  Structure of the full SARS-CoV-2 RNA genome in infected cells , 2020, bioRxiv.

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[8]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[9]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[10]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Yang Liu,et al.  Fine-tune BERT for Extractive Summarization , 2019, ArXiv.

[13]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[14]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[15]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[16]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[17]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[18]  Jungo Kasai,et al.  ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks , 2019, AAAI.

[19]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[20]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.