A deep-learning model-based and data-driven hybrid architecture for image annotation

Does adding more training data always help improve the effectiveness of a machine-learning or pattern-recognition task? Recent evidences in machine translation and speech recognition seem to suggest that the data-driven approach outperforms the traditional model-based approach. Instead of carefully modeling rules and their exceptions, the data-driven approach relies on identifying similar patterns in massive datasets and then uses the similar patterns to predict the labels (or other outcomes) of unseen instances. In this work, we compare representative data-driven and model-based schemes on an image annotation task. We enumerate pros and cons of these two approaches, and propose a hybrid approach, which can harness the strengths of the two.

[1]  Tomaso Poggio,et al.  Learning a dictionary of shape-components in visual cortex: comparison with neurons, humans and machines , 2006 .

[2]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[3]  T. Poggio,et al.  Are Cortical Models Really Bound by the “Binding Problem”? , 1999, Neuron.

[4]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[5]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Zhiyuan Liu,et al.  PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing , 2011, TIST.

[7]  Y. Yamane,et al.  Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns , 2001, Nature Neuroscience.

[8]  Tomaso Poggio,et al.  Intracellular measurements of spatial integration and the MAX operation in complex cells of the cat primary visual cortex. , 2004, Journal of neurophysiology.

[9]  Thomas Serre,et al.  Realistic Modeling of Simple and Complex Cell Tuning in the HMAX Model, and Implications for Invariant Object Recognition in Cortex , 2004 .

[10]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[11]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[13]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[14]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[15]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[16]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[17]  S. Canu,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[18]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[19]  H. Komatsu,et al.  Color Selectivity of Neurons in the Posterior Inferior Temporal Cortex of the Macaque Monkey , 2009, Cerebral cortex.

[20]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  T. Gawne,et al.  Responses of primate visual cortical V4 neurons to simultaneously presented stimuli. , 2002, Journal of neurophysiology.

[22]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .