Distributed learning of deep feature embeddings for visual recognition tasks

Deep learning has demonstrated an ability to substantially improve the state of the art for understanding the visual content of images. Much of the recent progress has been measured in the context of the ImageNet large-scale visual recognition challenge, with a modest subset of 1.2 million images, labeled according to 1,000 concepts of the full ImageNet dataset. Few published results have applied learning to the full ImageNet dataset of 14 million images over nearly 22,000 concepts. This is partly due to the substantial time and computational resources needed to perform adequate training, from such a large dataset, despite exploitation of graphics processing units. To achieve this scale of training, we use Phalanx, a distributed deep learning framework being developed by IBM. Phalanx is a distributed framework with a parameter server as the hub and multiple learners that employ the open source Caffe platform as spokes. Using Phalanx on the full ImageNet dataset, we performed experiments that demonstrate the impact of large-scale learning on multiple training scenarios. This paper includes fine-tuning, where a pretrained model is used as the basis for further training, as well as use of pretrained models for learning deep feature embeddings.

[1]  Forrest N. Iandola,et al.  FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  John R. Smith,et al.  Massive-scale learning of image and video semantic concepts , 2015, IBM J. Res. Dev..

[3]  He Ma,et al.  Theano-MPI: A Theano-Based Distributed Training Framework , 2016, Euro-Par Workshops.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[6]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yaoliang Yu,et al.  Petuum: A New Platform for Distributed Machine Learning on Big Data , 2015, IEEE Trans. Big Data.

[9]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[10]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Michele Merler,et al.  Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition , 2016, ACM Multimedia.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  John R. Smith,et al.  Snap, Eat, RepEat: A Food Recognition Engine for Dietary Logging , 2016, MADiMa @ ACM Multimedia.