Trade-off between speed and performance for colorectal endoscopic NBI image classification

This paper investigates a trade-off between computation time and recognition rate of local descriptor-based recognition for colorectal endoscopic NBI image classification. Recent recognition methods using descriptors have been successfully applied to medical image classification. The accuracy of these methods might depend on the quality of vector quantization (VQ) and encoding of descriptors, however an accurate quantization takes a long time. This paper reports how a simple sampling strategy affects performances with different encoding methods. First, we extract about 7.7 million local descriptors from training images of a dataset of 908 NBI endoscopic images. Second, we randomly choose a subset of between 7.7M and 19K descriptors for VQ. Third, we use three encoding methods (BoVW, VLAD, and Fisher vector) with different number of descriptors. Linear SVM is used for classification of a three-class problem. The computation time for VQ was drastically reduced by the factor of 100, while the peak performance was retained. The performance improved roughly 1% to 2% when more descriptors by over-sampling were used for encoding. Performances with descriptors extracted every pixel ("grid1") or every two pixels ("grid2") are similar, while the computation time is very different; grid2 is 5 to 30 times faster than grid1. The main finding of this work is twofold. First, recent encoding methods such as VLAD and Fisher vector are as insensitive to the quality of VQ as BoVW. Second, there is a trade-off between computation time and performance in encoding over-sampled descriptors with BoVW and Fisher vector, but not with VLAD.

[1]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[2]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Kazufumi Kaneda,et al.  Computer-Aided Colorectal Tumor Classification in NBI Endoscopy Using CNN Features , 2016, ArXiv.

[5]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[6]  Shinji Tanaka,et al.  Narrow-band imaging magnification predicts the histology and invasion depth of colorectal tumors. , 2009, Gastrointestinal endoscopy.

[7]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[9]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.