Summarization of biomedical articles using domain-specific word embeddings and graph ranking

Text summarization tools can help biomedical researchers and clinicians reduce the time and effort needed for acquiring important information from numerous documents. It has been shown that the input text can be modeled as a graph, and important sentences can be selected by identifying central nodes within the graph. However, the effective representation of documents, quantifying the relatedness of sentences, and selecting the most informative sentences are main challenges that need to be addressed in graph-based summarization. In this paper, we address these challenges in the context of biomedical text summarization. We evaluate the efficacy of a graph-based summarizer using different types of context-free and contextualized embeddings. The word representations are produced by pre-training neural language models on large corpora of biomedical texts. The summarizer models the input text as a graph in which the strength of relations between sentences is measured using the domain specific vector representations. We also assess the usefulness of different graph ranking techniques in the sentence selection step of our summarization method. Using the common Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, we evaluate the performance of our summarizer against various comparison methods. The results show that when the summarizer utilizes proper combinations of context-free and contextualized embeddings, along with an effective ranking method, it can outperform the other methods. We demonstrate that the best settings of our graph-based summarizer can efficiently improve the informative content of summaries and decrease the redundancy.

[1]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[2]  Hadrien Van Lierde,et al.  Learning with fuzzy hypergraphs: A topical approach to query-oriented text summarization , 2019, Inf. Sci..

[3]  Luca Cagliero,et al.  GraphSum: Discovering correlations among multiple terms for graph-based summarization , 2013, Inf. Sci..

[4]  Oussama Rouane,et al.  Combine clustering and frequent itemsets mining to enhance biomedical text summarization , 2019, Expert Syst. Appl..

[5]  Tommy W. S. Chow,et al.  Query-oriented text summarization based on hypergraph transversals , 2019, Inf. Process. Manag..

[6]  Alaa Hamouda,et al.  A survey of multiple types of text summarization with their satellite contents based on swarm intelligence optimization algorithms , 2019, Knowl. Based Syst..

[7]  Eduardo Fidalgo,et al.  SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders , 2019, Expert Syst. Appl..

[8]  Fei Wang,et al.  Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec , 2017, BMC Medical Informatics and Decision Making.

[9]  Noémie Elhadad,et al.  Automated methods for the summarization of electronic health records , 2015, J. Am. Medical Informatics Assoc..

[10]  Horacio Saggion,et al.  SUMMA. A Robust and Adaptable Summarization Tool , 2008, TAL.

[11]  Prasenjit Majumder,et al.  Effective aggregation of various summarization techniques , 2018, Inf. Process. Manag..

[12]  Panagiotis Stamatopoulos,et al.  Summarization from Medical Documents: A Survey , 2005, Artif. Intell. Medicine.

[13]  Milad Moradi,et al.  Different approaches for identifying important concepts in probabilistic biomedical text summarization , 2016, Artif. Intell. Medicine.

[14]  Matthias Samwald,et al.  Neural sentence embedding models for semantic similarity estimation in the biomedical domain , 2019, BMC Bioinformatics.

[15]  Pablo Gervás,et al.  A semantic graph-based approach to biomedical summarisation , 2011, Artif. Intell. Medicine.

[16]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[17]  Ruslan Mitkov,et al.  The Oxford handbook of computational linguistics , 2003 .

[18]  Milad Moradi,et al.  Quantifying the informativeness for biomedical literature summarization: An itemset mining method , 2016, Comput. Methods Programs Biomed..

[19]  Tapio Salakoski,et al.  Comparison of automatic summarisation methods for clinical free text notes , 2016, Artif. Intell. Medicine.

[20]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[21]  Guilherme Del Fiol,et al.  Formative evaluation of a patient-specific clinical knowledge summarization tool , 2016, Int. J. Medical Informatics.

[22]  Mourad Oussalah,et al.  SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis , 2019, Inf. Process. Manag..

[23]  Milad Moradi,et al.  CIBS: A biomedical text summarizer using topic-based sentence clustering , 2018, J. Biomed. Informatics.

[24]  Miguel A. Vega-Rodríguez,et al.  Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach , 2017, Knowl. Based Syst..

[25]  Georg Dorffner,et al.  Deep contextualized embeddings for quantifying the informative content in biomedical text summarization , 2020, Comput. Methods Programs Biomed..

[26]  Jan Snajder,et al.  Event graphs for information retrieval and multi-document summarization , 2014, Expert Syst. Appl..

[27]  Hyoil Han,et al.  The use of domain-specific concepts in biomedical text summarization , 2007, Inf. Process. Manag..

[28]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[29]  Mahmood Yousefi-Azar,et al.  Text summarization using unsupervised deep learning , 2017, Expert Syst. Appl..

[30]  Alaa Hamouda,et al.  Graph coloring and ACO based summarization for social networks , 2017, Expert Syst. Appl..

[31]  Guilherme Del Fiol,et al.  Text summarization in the biomedical domain: A systematic review of recent research , 2014, J. Biomed. Informatics.

[32]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.