Towards Evidence Extraction : Analysis of Scientific Figures from Studies of Molecular Interactions

Scientific figures, captions and accompanying text provide a valuable resource that comprise the evidence generated by a published scientific study. Extracting information pertaining to that evidence requires a pipeline made up of several intermediate steps. We describe machine reading analysis applied to papers that had been curated into the European Bioinformatics Institute’s INTACT database describing molecular interactions. We unpack multiple steps in an extraction pipeline that ultimately attempts to identify the type of experiments being performed automatically. We apply machine vision and natural language processing to classify figures and their associated text based on the type of methods used in the experiment to a level of accuracy that can likely support future biocuration tasks.

[1]  Gary D Bader,et al.  BMC Biology BioMed Central , 2007 .

[2]  Fang Liu,et al.  FigSearch: a figure legend indexing and classification system , 2004, Bioinform..

[3]  Katharine E. Hubbard,et al.  Perceptions of scientific research literature and strategies for reading papers depend on academic career stage , 2017, PloS one.

[4]  Carole A. Goble,et al.  The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web , 2014, ArXiv.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  G. Scita,et al.  Phosphoinositide 3-kinase activates Rac by entering in a complex with Eps8, Abi1, and Sos-1 , 2003, The Journal of cell biology.

[7]  Eduard H. Hovy,et al.  Extracting Evidence Fragments for Distant Supervision of Molecular Interactions , 2017, SemSci@ISWC.

[8]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[9]  Gully A. P. C. Burns,et al.  "Its All Made Up" — Why We Should Stop Building Representations Based on Interpretive Models and Focus on Experimental Evidence Instead , 2014, AAAI 2014.

[10]  Jessica A. Turner,et al.  The Ontology for Biomedical Investigations , 2016, PloS one.

[11]  Michael Krauthammer,et al.  Yale Image Finder (YIF): a new search engine for retrieving biomedical images , 2008, Bioinform..

[12]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[13]  Rachael P. Huntley,et al.  Standardized description of scientific evidence using the Evidence Ontology (ECO) , 2014, Database J. Biol. Databases Curation.

[14]  Robert P. Futrelle,et al.  Recognition and Classification of Figures in PDF Documents , 2005, GREC.

[15]  Jie Zou,et al.  Localizing and Recognizing Labels for Multi-Panel Figures in Biomedical Journals , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[16]  F. Ciompi,et al.  You Only Look on Lymphocytes Once , 2018 .

[17]  Michael Krauthammer,et al.  Finding and Accessing Diagrams in Biomedical Publications , 2012, AMIA.

[18]  William R Hersh,et al.  The TREC 2004 genomics track categorization task: classifying full text biomedical documents , 2006, Journal of biomedical discovery and collaboration.

[19]  Michael Krauthammer,et al.  Mining images in biomedical publications: Detection and analysis of gel diagrams , 2014, J. Biomed. Semant..

[20]  George R. Thoma,et al.  Line Segment-Based Stitched Multipanel Figure Separation for Effective Biomedical CBIR , 2017, Int. J. Pattern Recognit. Artif. Intell..

[21]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Eduard H. Hovy,et al.  Automated detection of discourse segment and experimental types from the text of cancer pathway results sections , 2016, Database J. Biol. Databases Curation.