Bag-of-Colors for Biomedical Document Image Classification

The number of biomedical publications has increased noticeably in the last 30 years. Clinicians and medical researchers regularly have unmet information needs but require more time for searching than is usually available to find publications relevant to a clinical situation. The techniques described in this article are used to classify images from the biomedical open access literature into categories, which can potentially reduce the search time. Only the visual information of the images is used to classify images based on a benchmark database of ImageCLEF 2011 created for the task of image classification and image retrieval. We evaluate particularly the importance of color in addition to the frequently used texture and grey level features. Results show that bags---of---colors in combination with the Scale Invariant Feature Transform (SIFT) provide an image representation allowing to improve the classification quality. Accuracy improved from 69.75% of the best system in ImageCLEF 2011 using visual information, only, to 72.5% of the system described in this paper. The results highlight the importance of color for the classification of biomedical images.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Henning Müller,et al.  Multi-scale visual words for hierarchical medical image categorisation , 2012, Other Conferences.

[3]  Henning Müller,et al.  Mobile Medical Visual Information Retrieval , 2012, IEEE Transactions on Information Technology in Biomedicine.

[4]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Krishnan Nallaperumal,et al.  Analysis of color feature extraction techniques for pathology image retrieval system , 2010, 2010 IEEE International Conference on Computational Intelligence and Computing Research.

[6]  Koen E. A. van de Sande,et al.  A comparison of color features for visual concept classification , 2008, CIVR '08.

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Ricardo da Silva Torres,et al.  RECOD at ImageCLEF 2011: Medical Modality Classification using Genetic Programming , 2011, CLEF.

[9]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[10]  Guojun Lu,et al.  Review of shape representation and description techniques , 2004, Pattern Recognit..

[11]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Electronic Imaging.

[12]  Danni Ai,et al.  Adaptive Color Independent Components Based SIFT Descriptors for Image Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[13]  Matthijs Douze,et al.  Bag-of-colors for improved image search , 2011, ACM Multimedia.

[14]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[15]  W R Hersh,et al.  How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. , 1998, JAMA.

[16]  Arjen Hoogendam,et al.  Answers to Questions Posed During Daily Patient Care Are More Likely to Be Answered by UpToDate Than PubMed , 2008, Journal of medical Internet research.

[17]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.

[19]  Gaurav Sharma,et al.  Digital color imaging , 1997, IEEE Trans. Image Process..

[20]  Henning Müllera,et al.  Health care professionals ’ image use and search behaviour , 2006 .

[21]  Henning Müller,et al.  Overview of the CLEF 2011 Medical Image Classification and Retrieval Tasks , 2011, CLEF.

[22]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[23]  L. R. Long,et al.  Content-based Image Retrieval for Scientific Literature Access , 2009, Methods of Information in Medicine.

[24]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  K. Cohen,et al.  Biomedical language processing: what's beyond PubMed? , 2006, Molecular cell.

[26]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[27]  Yong Haur Tay,et al.  RECENT TRENDS IN TEXTURE CLASSIFICATION: A REVIEW , 2009 .

[28]  Gabriela Csurka,et al.  XRCE's Participation at Medical Image Modality Classification and Ad-hoc Retrieval Tasks of Image CLEF2011 , 2011, CLEF.

[29]  Gabriela Csurka,et al.  Medical image modality classification and retrieval , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[30]  Bastian Leibe,et al.  Visual Object Recognition , 2011, Visual Object Recognition.

[31]  Mohammad Shahram Moin,et al.  A New Content-Based Image Retrieval Approach Based on Pattern Orientation Histogram , 2007, MIRAGE.

[32]  Cheng Thao,et al.  GoldMiner: a radiology image search engine. , 2007, AJR. American journal of roentgenology.

[33]  Anil K. Jain,et al.  Image retrieval using color and shape , 1996, Pattern Recognit..

[34]  Shih-Fu Chang,et al.  Exploring Text and Image Features to Classify Images in Bioscience Literature , 2006, BioNLP@NAACL-HLT.

[35]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[36]  Farshad Fotouhi,et al.  Automatically Finding Images for Clinical Decision Support , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).