Evaluating Casama: Contextualized semantic maps for summarization of lung cancer studies

OBJECTIVE It is crucial for clinicians to stay up to date on current literature in order to apply recent evidence to clinical decision making. Automatic summarization systems can help clinicians quickly view an aggregated summary of literature on a topic. Casama, a representation and summarization system based on "contextualized semantic maps," captures the findings of biomedical studies as well as the contexts associated with patient population and study design. This paper presents a user-oriented evaluation of Casama in comparison to a context-free representation, SemRep. MATERIALS AND METHODS The effectiveness of the representation was evaluated by presenting users with manually annotated Casama and SemRep summaries of ten articles on driver mutations in cancer. Automatic annotations were evaluated on a collection of articles on EGFR mutation in lung cancer. Seven users completed a questionnaire rating the summarization quality for various topics and applications. RESULTS Casama had higher median scores than SemRep for the majority of the topics (p≤ 0.00032), all of the applications (p≤ 0.00089), and in overall summarization quality (p≤ 1.5e-05). Casama's manual annotations outperformed Casama's automatic annotations (p = 0.00061). DISCUSSION Casama performed particularly well in the representation of strength of evidence, which was highly rated both quantitatively and qualitatively. Users noted that Casama's less granular, more targeted representation improved usability compared to SemRep. CONCLUSION This evaluation demonstrated the benefits of a contextualized representation for summarizing biomedical literature on cancer. Iteration on specific areas of Casama's representation, further development of its algorithms, and a clinically-oriented evaluation are warranted.

[1]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[2]  Carol Friedman,et al.  PhenoGO: an integrated resource for the multiscale mining of clinical and biological data , 2009, BMC Bioinformatics.

[3]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[4]  Vimla L. Patel,et al.  Usability evaluation of an experimental text summarization system and three search engines: implications for the reengineering of health care interfaces , 2002, AMIA.

[5]  Halil Kilicoglu,et al.  Semantic MEDLINE: An advanced information management application for biomedicine , 2011, Inf. Serv. Use.

[6]  John F. Sowa,et al.  Conceptual Graphs for a Data Base Interface , 1976, IBM J. Res. Dev..

[7]  Ping Chen,et al.  A Query-Based Medical Information Summarization System Using Ontology Knowledge , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[8]  Sunil Kumar Sahu,et al.  Drug-Drug Interaction Extraction from Biomedical Text Using Long Short Term Memory Network , 2017, J. Biomed. Informatics.

[9]  Paul F. Bugni,et al.  A knowledgebase system to enhance scientific discovery: Telemakus , 2004, Biomedical digital libraries.

[10]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[11]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[12]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[13]  Hongfei Lin,et al.  Drug drug interaction extraction from biomedical literature using syntax convolutional neural network , 2016, Bioinform..

[14]  Lei Hua,et al.  A Shortest Dependency Path Based Convolutional Neural Network for Protein-Protein Relation Extraction , 2016, BioMed research international.

[15]  Yang Wang,et al.  Question Answering Summarization of Multiple Biomedical Documents , 2007, Canadian Conference on AI.

[16]  Cui Tao,et al.  OAE: The Ontology of Adverse Events , 2014, J. Biomed. Semant..

[17]  Ulf Leser,et al.  ALIBABA: PubMed as a graph , 2006, Bioinform..

[18]  Halil Kilicoglu,et al.  Constructing a semantic predication gold standard from the biomedical literature , 2011, BMC Bioinformatics.

[19]  Diego Mollá,et al.  Creation of a corpus for evidence based medicine summarisation. , 2012, The Australasian medical journal.

[20]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[21]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[22]  Marcelo Fiszman,et al.  Semantic Interpretation for the Biomedical Research Literature , 2005 .

[23]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[24]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[25]  Min-Yen Kan,et al.  Customization in a unified framework for summarizing medical literature , 2005, Artif. Intell. Medicine.

[26]  Charles A Powell,et al.  International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society: international multidisciplinary classification of lung adenocarcinoma: executive summary. , 2011, Proceedings of the American Thoracic Society.

[27]  Hans-Peter Kriegel,et al.  Extraction of semantic biomedical relations from text using conditional random fields , 2008, BMC Bioinformatics.

[28]  Guilherme Del Fiol,et al.  Text summarization in the biomedical domain: A systematic review of recent research , 2014, J. Biomed. Informatics.

[29]  Henrik Eriksson,et al.  Plug-and-Play: Construction of Task-Speci c Expert-System Shells Using Sharable Context Ontologies , 1996 .

[30]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[31]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[32]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[33]  Peter D. Turney The Identification of Context-Sensitive Features: A Formal Definition of Context for Concept Learning , 2002, ArXiv.

[34]  Joseph D. Novak,et al.  Learning How to Learn , 1984 .

[35]  Gillian Ellison,et al.  EGFR mutation testing in lung cancer: a review of available methods and their use for analysis of tumour tissue and cytology samples , 2012, Journal of Clinical Pathology.

[36]  Halil Kilicoglu,et al.  Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation , 2009, J. Biomed. Informatics.

[37]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[38]  Wen-Lian Hsu,et al.  BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features , 2007, BMC Bioinformatics.

[39]  John McCarthy,et al.  Notes on Formalizing Context , 1993, IJCAI.

[40]  Joseph D. Novak,et al.  A theory of education , 1977 .

[41]  Kevin Donnelly,et al.  SNOMED-CT: The advanced terminology and coding system for eHealth. , 2006, Studies in health technology and informatics.

[42]  Ravi Salgia,et al.  Molecular biomarkers for future screening of lung cancer , 2013, Journal of surgical oncology.

[43]  David R Baldwin,et al.  Lung cancer in England: information from the National Lung Cancer Audit (LUCADA). , 2011, Lung cancer.

[44]  Shasha Li,et al.  Drug-Drug Interaction Extraction via Recurrent Neural Network with Multiple Attention Layers , 2017, ADMA.

[45]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.

[46]  Hong Yu,et al.  AskHERMES: An online question answering system for complex clinical questions , 2011, J. Biomed. Informatics.

[47]  César de Pablo-Sánchez,et al.  Using a shallow linguistic kernel for drug-drug interaction extraction , 2011, J. Biomed. Informatics.

[48]  Pablo Gervás,et al.  Concept-Graph Based Biomedical Automatic Summarization Using Ontologies , 2008, COLING 2008.

[49]  Patrick Brézillon Focusing on Context in Human-Centered Computing , 2003, IEEE Intell. Syst..

[50]  Denise R. Aberle,et al.  Representing and extracting lung cancer study metadata: Study objective and study design , 2015, Comput. Biol. Medicine.

[51]  Denise R. Aberle,et al.  Toward patient-tailored summarization of lung cancer literature , 2016, 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[52]  Goran Nenadic,et al.  BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events , 2012, Bioinform..

[53]  S Senan,et al.  Metastatic non-small cell lung cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. , 2018, Annals of oncology : official journal of the European Society for Medical Oncology.