Multi-Instance Deep Learning: Discover Discriminative Local Anatomies for Bodypart Recognition

In general image recognition problems, discriminative information often lies in local image patches. For example, most human identity information exists in the image patches containing human faces. The same situation stays in medical images as well. “Bodypart identity” of a transversal slice-which bodypart the slice comes from-is often indicated by local image information, e.g., a cardiac slice and an aorta arch slice are only differentiated by the mediastinum region. In this work, we design a multi-stage deep learning framework for image classification and apply it on bodypart recognition. Specifically, the proposed framework aims at: 1) discover the local regions that are discriminative and non-informative to the image classification problem, and 2) learn a image-level classifier based on these local regions. We achieve these two tasks by the two stages of learning scheme, respectively. In the pre-train stage, a convolutional neural network (CNN) is learned in a multi-instance learning fashion to extract the most discriminative and and non-informative local patches from the training slices. In the boosting stage, the pre-learned CNN is further boosted by these local patches for image classification. The CNN learned by exploiting the discriminative local appearances becomes more accurate than those learned from global image context. The key hallmark of our method is that it automatically discovers the discriminative and non-informative local patches through multi-instance deep learning. Thus, no manual annotation is required. Our method is validated on a synthetic dataset and a large scale CT dataset. It achieves better performances than state-of-the-art approaches, including the standard deep CNN.

[1]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[2]  Ronald M. Summers,et al.  Anatomy-specific classification of medical images using deep convolutional nets , 2015, 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI).

[3]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[6]  Yiqiang Zhan,et al.  Active Scheduling of Organ Detection and Segmentation in Whole-Body Medical Images , 2008, MICCAI.

[7]  Junzhou Huang,et al.  Towards robust and effective shape modeling: Sparse shape composition , 2012, Medical Image Anal..

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Horst Bischof,et al.  Global localization of 3D anatomical structures by pre-filtered Hough Forests and discrete optimization , 2013, Medical Image Anal..

[10]  Qi Zhang,et al.  EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.

[11]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[13]  Joel H. Saltz,et al.  Efficient Multiple Instance Convolutional Neural Networks for Gigapixel Resolution Image Classification , 2015, ArXiv.

[14]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[15]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[17]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  S. Thamarai Selvi,et al.  Early Detection of Breast Cancer using SVM Classifier Technique , 2009, ArXiv.

[19]  Frans Vos,et al.  Computer-Aided Detection of Polyps in CT Colonography Using Logistic Regression , 2010, IEEE Transactions on Medical Imaging.

[20]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[21]  Shu Liao,et al.  Bodypart Recognition Using Multi-stage Deep Learning , 2015, IPMI.

[22]  C. V. Jawahar,et al.  Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[24]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Antonio Criminisi,et al.  Regression Forests for Efficient Anatomy Detection and Localization in CT Studies , 2010, MCV.

[26]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[27]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[28]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[29]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[30]  Jiajun Wu,et al.  Deep multiple instance learning for image classification and auto-annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[32]  Dimitris N. Metaxas,et al.  Deformable segmentation via sparse representation and dictionary learning , 2012, Medical Image Anal..

[33]  Antonio Criminisi,et al.  Decision Forests with Long-Range Spatial Context for Organ Localization in CT Volumes , 2009 .

[34]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[35]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[36]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[39]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[40]  Sung Bum Pan,et al.  A Novel Algorithm for Identification of Body Parts in Medical Images , 2006, FSKD.

[41]  Junzhou Huang,et al.  Deformable Segmentation via Sparse Shape Representation , 2011, MICCAI.

[42]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[43]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[44]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[45]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[46]  Michael Kohnen,et al.  Quality of DICOM header information for image categorization , 2002, SPIE Medical Imaging.

[47]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[48]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[49]  Devi Parikh Recognizing jumbled images: The role of local and global information in image classification , 2011, 2011 International Conference on Computer Vision.

[50]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).