Figure content analysis for improved biomedical article retrieval

Biomedical images are invaluable in medical education and establishing clinical diagnosis. Clinical decision support (CDS) can be improved by combining biomedical text with automatically annotated images extracted from relevant biomedical publications. In a previous study we reported 76.6% accuracy using supervised machine learning on the feasibility of automatically classifying images by combining figure captions and image content for usefulness in finding clinical evidence. Image content extraction is traditionally applied on entire images or on pre-determined image regions. Figure images articles vary greatly limiting benefit of whole image extraction beyond gross categorization for CDS due to the large variety. However, text annotations and pointers on them indicate regions of interest (ROI) that are then referenced in the caption or discussion in the article text. We have previously reported 72.02% accuracy in text and symbols localization but we failed to take advantage of the referenced image locality. In this work we combine article text analysis and figure image analysis for localizing pointer (arrows, symbols) to extract ROI pointed that can then be used to measure meaningful image content and associate it with the identified biomedical concepts for improved (text and image) content-based retrieval of biomedical articles. Biomedical concepts are identified using National Library of Medicine's Unified Medical Language System (UMLS) Metathesaurus. Our methods report an average precision and recall of 92.3% and 75.3%, respectively on identifying pointing symbols in images from a randomly selected image subset made available through the ImageCLEF 2008 campaign.

[1]  Herbert Freeman,et al.  On the Encoding of Arbitrary Geometric Configurations , 1961, IRE Trans. Electron. Comput..

[2]  Sameer Antani,et al.  Automatically Finding Images for Clinical Decision Support , 2007 .

[3]  Ronny Martens,et al.  Dynamic programming optimisation for on-line signature verification , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[4]  Balaji Vasan Srinivasan,et al.  Exploring use of images in clinical articles for decision support in evidence-based medicine , 2008, Electronic Imaging.

[5]  George R. Thoma,et al.  The Role of Title, Metadata and Abstract in Identifying Clinically Relevant Journal Articles , 2005, AMIA.

[6]  Josef Kittler,et al.  Minimum error thresholding , 1986, Pattern Recognit..

[7]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[8]  Henning Müller,et al.  Overview of the ImageCLEFmed 2008 Medical Image Retrieval Task , 2008, CLEF.

[9]  Gyeonghwan Kim,et al.  An approach for locating segmentation points of handwritten digit strings using a neural network , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[10]  George R. Thoma,et al.  Combining Medical Domain Ontological Knowledge and Low-level Image Features for Multimedia Indexing , 2008 .

[11]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[12]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.