Structured literature image finder: Parsing text and figures in biomedical literature

The SLIF project combines text-mining and image processing to extract structured information from biomedical literature. SLIF extracts images and their captions from published papers. The captions are automatically parsed for relevant biological entities (protein and cell type names), while the images are classified according to their type (e.g., micrograph or gel). Fluorescence microscopy images are further processed and classified according to the depicted subcellular localization. The results of this process can be queried online using either a user-friendly web-interface or an XML-based web-service. As an alternative to the targeted query paradigm, SLIF also supports browsing the collection based on latent topic models which are derived from both the annotated text and the image data. The SLIF web application, as well as labeled datasets used for training system components, is publicly available at http://slif.cbi.cmu.edu.

[1]  Joshua D. Kangas,et al.  Structured literature image finder : Open source software for extracting and disseminating information from text and figures in biomedical literature , 2010 .

[2]  Eric P. Xing,et al.  Structured correspondence topic models for mining captioned figures in biological literature , 2009, KDD.

[3]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[4]  Shih-Fu Chang,et al.  Exploring Text and Image Features to Classify Images in Bioscience Literature , 2006, BioNLP@NAACL-HLT.

[5]  Robert F. Murphy,et al.  EXTRACTING AND STRUCTURING SUBCELLULAR LOCATION INFORMATION FROM ON-LINE JOURNAL ARTICLES: THE SUBCELLULAR LOCATION IMAGE FINDER , 2004 .

[6]  Marcel Worring,et al.  Genre-based search through biomedical images , 2002, Object recognition supported by user interaction for service robots.

[7]  William W. Cohen,et al.  High-recall protein entity recognition using a dictionary , 2005, ISMB.

[8]  Hagit Shatkay,et al.  Integrating image data into biomedical text categorization , 2006, ISMB.

[9]  William W. Cohen,et al.  A Stacked Graphical Model for Associating Sub-Images with Sub-Captions , 2007, Pacific Symposium on Biocomputing.

[10]  Jie Yao,et al.  Searching online journals for fluorescence microscope images depicting protein subcellular location patterns , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[11]  William W. Cohen,et al.  Extracting information from text and images for location proteomics , 2003, BIOKDD.

[12]  William W. Cohen,et al.  Understanding captions in biomedical publications , 2003, KDD '03.

[13]  Marti A. Hearst,et al.  Exploring the Efficacy of Caption Search for Bioscience Journal Search Interfaces , 2007, BioNLP@ACL.

[14]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.