UETrice at MEDIQA 2021: A Prosper-thy-neighbour Extractive Multi-document Summarization Model

This paper describes a system developed to summarize multiple answers challenge in the MEDIQA 2021 shared task collocated with the BioNLP 2021 Workshop. We propose an extractive summarization architecture based on several scores and state-of-the-art techniques. We also present our novel prosper-thy-neighbour strategies to improve performance. Our model has been proven to be effective with the best ROUGE-1/ROUGE-L scores, being the shared task runner up by ROUGE-2 F1 score (over 13 participated teams).

[1]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[2]  Martin Jaggi,et al.  Simple Unsupervised Keyphrase Extraction using Sentence Embeddings , 2018, CoNLL.

[3]  Derwin Suhartono,et al.  Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF) , 2016 .

[4]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[7]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[8]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[9]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[10]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[11]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[12]  Dina Demner-Fushman,et al.  Overview of the MEDIQA 2021 Shared Task on Summarization in the Medical Domain , 2021, BIONLP.

[13]  Krys J. Kochut,et al.  Text Summarization Techniques: A Brief Survey , 2017, International Journal of Advanced Computer Science and Applications.

[14]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[15]  Daniel King,et al.  ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing , 2019, BioNLP@ACL.

[16]  Asma Ben Abacha,et al.  Question-driven summarization of answers to consumer health questions , 2020, Scientific Data.

[17]  Chin-Yew Lin,et al.  Looking for a Few Good Metrics: ROUGE and its Evaluation , 2004 .

[18]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.