Multi-modal subspace learning with dropout regularization for cross-modal recognition and retrieval

There has been a surge of efforts in cross-modal recognition and retrieval in recent multimedia research. Towards this goal, we investigate a multi-modal subspace learning algorithm together with the Dropout regularizer. Inspired by the regularization for neural networks, we propose to aritificially remove the effect of certain amount of feature bins using the probabilistic approach to prevent the linear subspace learning from over-fitting. The novel regularizer is well integrated into the multi-modal learning algorithm which maximizes the between-class scatter while minimizing the within-class scatter in the projected latent space. The new objective function can be solved efficiently as the generalized eigenvalue problem. Experimental results have shown that superior performance can be obtained in both face-sketch recognition and cross-modal retrieval applications.

[1]  D. Jacobs,et al.  Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch , 2011, CVPR 2011.

[2]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[3]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[7]  Allan Aasbjerg Nielsen,et al.  Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data , 2002, IEEE Trans. Image Process..

[8]  Hinrich Schütze,et al.  Introduction to Information Retrieval: Evaluation in information retrieval , 2008 .

[9]  Alexandros Iosifidis,et al.  DropELM: Fast neural network regularization with Dropout and DropConnect , 2015, Neurocomputing.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[13]  王晓刚,et al.  Coupled Information-Theoretic Encoding for Face Photo-Sketch Recognition , 2011 .

[14]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[15]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Tieniu Tan,et al.  Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Alexandros Iosifidis,et al.  Generalized Multi-View Embedding for Visual Recognition and Cross-Modal Retrieval , 2016, IEEE Transactions on Cybernetics.

[18]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Amit R.Sharma,et al.  Face Photo-Sketch Synthesis and Recognition , 2012 .

[20]  XiangTao,et al.  Transductive Multi-View Zero-Shot Learning , 2015 .

[21]  Roger Levy,et al.  On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Shiguang Shan,et al.  Multi-View Discriminant Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Harry Wechsler,et al.  The FERET database and evaluation procedure for face-recognition algorithms , 1998, Image Vis. Comput..

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).