论文信息 - Classifying Long Clinical Documents with Pre-trained Transformers

Classifying Long Clinical Documents with Pre-trained Transformers

Automatic phenotyping is a task of identifying cohorts of patients that match a predefined set of criteria. Phenotyping typically involves classifying long clinical documents that contain thousands of tokens. At the same time, recent state-of-art transformer-based pre-trained language models limit the input to a few hundred tokens (e.g. 512 tokens for BERT). We evaluate several strategies for incorporating pre-trained sentence encoders into document-level representations of clinical text, and find that hierarchical transformers without pre-training are competitive with task pre-trained models.

[1] Peter Szolovits,et al. MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[2] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.

[3] Diyi Yang,et al. Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[4] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.

[5] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[9] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[10] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[11] Jimmy J. Lin,et al. DocBERT: BERT for Document Classification , 2019, ArXiv.

[12] Xuanjing Huang,et al. How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[13] Kirk Roberts,et al. Patient Representation Transfer Learning from Clinical Notes based on Hierarchical Attention Network. , 2020, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.