Tri-Train: Automatic Pre-Fine Tuning between Pre-Training and Fine-Tuning for SciNER

The training process of scientific NER models is commonly performed in two steps: i) Pre-training a language model by self-supervised tasks on huge data and ii) fine-tune training with small labelled data. The success of the strategy depends on the relevance between the data domains and between the tasks. However, gaps are found in practice when the target domains are specific and small. We propose a novel framework to introduce a “pre-fine tuning” step between pre-training and fine-tuning. It constructs a corpus by selecting sentences from unlabeled documents that are the most relevant with the labelled training data. Instead of predicting tokens in random spans, the pre-fine tuning task is to predict tokens in entity candidates identified by text mining methods. Pre-fine tuning is automatic and light-weight because the corpus size can be much smaller than pre-training data to achieve a better performance. Experiments on seven benchmarks demonstrate the effectiveness.

[1]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[2]  Srayan Datta,et al.  Building a Scientific Concept Hierarchy Database (SCHBase) , 2015, ACL.

[3]  Luo Si,et al.  StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding , 2019, ICLR.

[4]  Nitesh V. Chawla,et al.  Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text , 2019, EMNLP/IJCNLP.

[5]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[6]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[7]  Yiming Yang,et al.  Transformer-XL: Language Modeling with Longer-Term Dependency , 2018 .

[8]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[9]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[10]  Wenhao Yu,et al.  Faceted Hierarchy: A New Graph Type to Organize Scientific Concepts and a Construction Method , 2019, TextGraphs@EMNLP.

[11]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[12]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[13]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Qingkai Zeng,et al.  Biomedical Knowledge Graphs Construction From Conditional Statements , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Jiawei Han,et al.  FTS : Faceted Taxonomy Construction and Search for Scientific Publications , 2018 .

[17]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Jingbo Shang,et al.  Empower Entity Set Expansion via Language Model Probing , 2020, ACL.

[20]  Adrian Ulges,et al.  Span-based Joint Entity and Relation Extraction with Transformer Pre-training , 2020, ECAI.

[21]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[22]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[23]  Wenhao Yu,et al.  Experimental Evidence Extraction System in Data Science with Hybrid Table Features and Ensemble Learning , 2020, WWW.

[24]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[25]  Mari Ostendorf,et al.  Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction , 2018, EMNLP.

[26]  Mari Ostendorf,et al.  A general framework for information extraction using dynamic span graphs , 2019, NAACL.

[27]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[28]  Jiawei Han,et al.  Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Sampo Pyysalo,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[31]  Sampo Pyysalo,et al.  Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011 , 2012, BMC Bioinformatics.