Classifying Biomedical Figures Using Combination of Bag of Keypoints and Bag of Words

Figures in full science papers contain much information not described in the main text. The use of these figures is an important subject. In this study, we developed a method to classify these biomedical figures into categories using a combination of a bag of keypoints approach and a bag of words approach for the legends. For bag of keypoints, the descriptors of detected interest points are quantized by k-means clustering and are converted into a feature vector element. The figure classification is carried out using a one-against-all multi-class support vector machine. When the Harris-affine and extended scale-invariant feature transform method are used for interest point detection and feature description, respectively, the classification accuracy of bag of keypoints is about 20% better than that of field-level image descriptors which are similar to descriptors used in previous studies. Further, when bag of words for legends is combined with this method, the prediction performance achieved 75.7% classification accuracy. Introducing bag of keypoints not only increases the classification performance but also enables the figures to be treated as text words. This method is expected to be useful for figure similarity search and information retrieval across figures and text.

[1]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[2]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[3]  Jie Yao,et al.  Searching online journals for fluorescence microscope images depicting protein subcellular location patterns , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[4]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[7]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[10]  William R Hersh,et al.  The TREC 2004 genomics track categorization task: classifying full text biomedical documents , 2006, Journal of biomedical discovery and collaboration.

[11]  Pietro Perona,et al.  Evaluation of Features Detectors and Descriptors based on 3D Objects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Fang Liu,et al.  FigSearch: a figure legend indexing and classification system , 2004, Bioinform..

[13]  Yuntao Qian,et al.  Improved recognition of figures containing fluorescence microscope images in online journal articles using graphical models , 2008, Bioinform..

[14]  Anil K. Jain,et al.  Shape-Based Retrieval: A Case Study With Trademark Image Databases , 1998, Pattern Recognit..

[15]  Alfonso Valencia,et al.  Evaluation of BioCreAtIvE assessment of task 2 , 2005, BMC Bioinformatics.

[16]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[17]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[18]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[19]  Shih-Fu Chang,et al.  Exploring Text and Image Features to Classify Images in Bioscience Literature , 2006, BioNLP@NAACL-HLT.

[20]  Yasunori Yamamoto,et al.  Figure Classification in Biomedical Literature towards Figure Mining , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[21]  Hagit Shatkay,et al.  Integrating image data into biomedical text categorization , 2006, ISMB.