WBI at MEDIQA 2021: Summarizing Consumer Health Questions with Generative Transformers

This paper describes our contribution for the MEDIQA-2021 Task 1 question summarization competition. We model the task as conditional generation problem. Our concrete pipeline performs a finetuning of the large pretrained generative transformers PEGASUS (Zhang et al.,2020a) and BART (Lewis et al.,2020). We used the resulting models as strong baselines and experimented with (i) integrating structured knowledge via entity embeddings, (ii) ensembling multiple generative models with the generator-discriminator framework and (iii) disentangling summarization and interrogative prediction to achieve further improvements.Our best performing model, a fine-tuned vanilla PEGASUS, reached the second place in the competition with an ROUGE-2-F1 score of 15.99. We observed that all of our additional measures hurt performance (up to 5.2 pp) on the official test set. In course of a post-hoc experimental analysis which uses a larger validation set results indicate slight performance improvements through the proposed extensions. However, further analysis is need to provide stronger evidence.

[1]  Krys J. Kochut,et al.  Text Summarization Techniques: A Brief Survey , 2017, International Journal of Advanced Computer Science and Applications.

[2]  Yao Zhao,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.

[3]  Bob Carpenter,et al.  The Benefits of a Model of Annotation , 2013, Transactions of the Association for Computational Linguistics.

[4]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[5]  Roland Vollgraf,et al.  FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP , 2019, NAACL.

[6]  Ulf Leser,et al.  NLProlog: Reasoning with Weak Unification for Question Answering in Natural Language , 2019, ACL.

[7]  Asma Ben Abacha,et al.  On the Role of Question Summarization and Information Source Restriction in Consumer Health Question Answering. , 2019, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[8]  Maryam Habibi,et al.  HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition , 2020, Bioinform..

[9]  Halil Kilicoglu,et al.  Semantic annotation of consumer health questions , 2018, BMC Bioinformatics.

[10]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[11]  Dina Demner-Fushman,et al.  Overview of the MEDIQA 2021 Shared Task on Summarization in the Medical Domain , 2021, BIONLP.

[12]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[13]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[14]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[15]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[16]  Natalie Schluter,et al.  The limits of automatic summarisation according to ROUGE , 2017, EACL.

[17]  Ali Abdalla,et al.  Towards Neural Similarity Evaluator , 2019 .

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[20]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.