Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature

Slif uses a combination of text-mining and image processing to extract information from figures in the biomedical literature. It also uses innovative extensions to traditional latent topic modeling to provide new ways to traverse the literature. Slif provides a publicly available searchable database (http://slif.cbi.cmu.edu). Slif originally focused on fluorescence microscopy images. We have now extended it to classify panels into more image types. We also improved the classification into subcellular classes by building a more representative training set. To get the most out of the human labeling effort, we used active learning to select images to label. We developed models that take into account the structure of the document (with panels inside figures inside papers) and the multi-modality of the information (free and annotated text, images, information from external databases). This has allowed us to provide new ways to navigate a large collection of documents.

[1]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[2]  Hagit Shatkay,et al.  Integrating image data into biomedical text categorization , 2006, ISMB.

[3]  Nicholas A. Hamilton,et al.  Fast automated cell phenotype image classification , 2007, BMC Bioinformatics.

[4]  William W. Cohen,et al.  Understanding captions in biomedical publications , 2003, KDD '03.

[5]  Jie Yao,et al.  Searching online journals for fluorescence microscope images depicting protein subcellular location patterns , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[6]  Anthony Ralston,et al.  Statistical Methods for Digital Computers. , 1980 .

[7]  Robert F. Murphy,et al.  EXTRACTING AND STRUCTURING SUBCELLULAR LOCATION INFORMATION FROM ON-LINE JOURNAL ARTICLES: THE SUBCELLULAR LOCATION IMAGE FINDER , 2004 .

[8]  D. Gingras,et al.  Sphingosine‐1‐phosphate induces the association of membrane‐type 1 matrix metalloproteinase with p130Cas in endothelial cells , 2008, FEBS letters.

[9]  Eric P. Xing,et al.  Structured literature image finder: Parsing text and figures in biomedical literature , 2010, J. Web Semant..

[10]  Marcel Worring,et al.  Genre-based search through biomedical images , 2002, Object recognition supported by user interaction for service robots.

[11]  T. W. Ridler,et al.  Picture thresholding using an iterative selection method. , 1978 .

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[14]  Robert F. Murphy,et al.  A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells , 2001, Bioinform..

[15]  William W. Cohen,et al.  High-recall protein entity recognition using a dictionary , 2005, ISMB.

[16]  Shih-Fu Chang,et al.  Exploring Text and Image Features to Classify Images in Bioscience Literature , 2006, BioNLP@NAACL-HLT.

[17]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .

[18]  William W. Cohen,et al.  A Stacked Graphical Model for Associating Sub-Images with Sub-Captions , 2007, Pacific Symposium on Biocomputing.

[19]  Eric P. Xing,et al.  Structured correspondence topic models for mining captioned figures in biological literature , 2009, KDD.