Semantic preserving distance metric learning and applications

How do we accurately browse a large set of images or efficiently annotate the images from an image library? Image clustering methods are invaluable tools for applications such as content-based image retrieval and image annotation. To perform these tasks, it is critical to have proper features to describe the visual and semantic content of images and to define an accurate distance metric to measure the dissimilarity between any two images. However, existing methods, which adopt the features of color histograms, edge direction histograms and shape context, lack the ability to describe semantic content. To solve this problem, we propose a new approach that utilizes user-provided pairwise constraints to describe the semantic relationship between two images. A Semantic Preserving Distance Metric Learning (SP-DML) algorithm is developed to explore the complementary characteristics of the visual features and pairwise constraints in a unified feature space. In this space, the learned distance metric can be used to measure the dissimilarity between two images. Specifically, the manifold structure adopted in SP-DML is revealed by the image's visual features. To integrate semantic contents in distance metric learning, SP-DML utilizes pairwise constraints to build semantic patches and align these patches to obtain the optimal distance metric for the new feature space. Experimental results in image clustering demonstrate that the performance of SP-DML is appealing.

[1]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[2]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[3]  Min Wu,et al.  Multi-label ensemble based on variable pairwise constraint projection , 2013, Inf. Sci..

[4]  Xinge You,et al.  Generalization performance of magnitude-preserving semi-supervised ranking with graph-based regularization , 2013, Inf. Sci..

[5]  James Ze Wang,et al.  Real-time computerized annotation of pictures. , 2008, IEEE transactions on pattern analysis and machine intelligence.

[6]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[7]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[8]  Jason Weston,et al.  Multi-Tasking with Joint Semantic Spaces for Large-Scale Music Annotation and Retrieval , 2011 .

[9]  Xuelong Li,et al.  Patch Alignment for Dimensionality Reduction , 2009, IEEE Transactions on Knowledge and Data Engineering.

[10]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[11]  Dacheng Tao,et al.  Discriminative Locality Alignment , 2008, ECCV.

[12]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Jane You,et al.  Visual query processing for efficient image retrieval using a SOM-based filter-refinement scheme , 2012, Inf. Sci..

[14]  Jordi Vitrià,et al.  Clustering in image space for place recognition and visual annotations for human-robot interaction , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[15]  Jun Yu,et al.  On Combining Multiple Features for Cartoon Character Retrieval and Clip Synthesis , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[17]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[18]  Anil K. Jain,et al.  Image retrieval using color and shape , 1996, Pattern Recognit..

[19]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[20]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[21]  Kaizhu Huang,et al.  Generalized sparse metric learning with relative comparisons , 2011, Knowledge and Information Systems.

[22]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[23]  Helen C. Shen,et al.  Linear Neighborhood Propagation and Its Applications , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Zi Huang,et al.  Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis , 2013, IEEE Transactions on Multimedia.

[26]  Meng Wang,et al.  Multimodal Graph-Based Reranking for Web Image Search , 2012, IEEE Transactions on Image Processing.

[27]  Jun Yu,et al.  Complex Object Correspondence Construction in Two-Dimensional Animation , 2011, IEEE Transactions on Image Processing.

[28]  Meng Wang,et al.  Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis , 2012, IEEE Transactions on Image Processing.

[29]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Jun Yu,et al.  Exploiting Click Constraints and Multi-view Features for Image Re-ranking , 2014, IEEE Transactions on Multimedia.

[31]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[33]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[34]  I. Jolliffe Principal Component Analysis , 2002 .

[35]  Jason Weston,et al.  Large-Scale Music Annotation and Retrieval: Learning to Rank in Joint Semantic Spaces , 2011, ArXiv.

[36]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[37]  Jiwen Lu,et al.  Neighborhood repulsed metric learning for kinship verification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[39]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[40]  Kaizhu Huang,et al.  Sparse Metric Learning via Smooth Optimization , 2009, NIPS.

[41]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[42]  Furu Wei,et al.  Exploring hypergraph-based semi-supervised ranking for query-oriented summarization , 2013, Inf. Sci..

[43]  Xuelong Li,et al.  Biologically Inspired Features for Scene Classification in Video Surveillance , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[44]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[45]  Qiangfu Zhao,et al.  Model reduction of neural network trees based on dimensionality reduction , 2009, 2009 International Joint Conference on Neural Networks.

[46]  Yahong Han,et al.  Image classification with manifold learning for out-of-sample data , 2013, Signal Process..

[47]  Ramin Zabih,et al.  Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[48]  Jeff G. Schneider,et al.  Maximum Margin Output Coding , 2012, ICML.

[49]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[50]  Kaizhu Huang,et al.  GSML: A Unified Framework for Sparse Metric Learning , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[51]  Jiangtao Peng,et al.  Error bounds of multi-graph regularized semi-supervised classification , 2009, Inf. Sci..

[52]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[53]  Hui Xiong,et al.  Distance metrics for high dimensional nearest neighborhood recovery: Compression and normalization , 2012, Inf. Sci..

[54]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Inderjit S. Dhillon,et al.  Inductive Regularized Learning of Kernel Functions , 2010, NIPS.

[56]  Ricardo da Silva Torres,et al.  Exploiting pairwise recommendation and clustering strategies for image re-ranking , 2012, Inf. Sci..

[57]  Shiri Gordon,et al.  Applying the information bottleneck principle to unsupervised clustering of discrete and continuous image representations , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[58]  Jun Yu,et al.  Modern Machine Learning Techniques and Their Applications in Cartoon Animation Research , 2013 .

[59]  James Ze Wang,et al.  A scalable integrated region-based image retrieval system , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[60]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Anil K. Jain,et al.  Texture classification and segmentation using multiresolution simultaneous autoregressive models , 1992, Pattern Recognit..

[62]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[63]  Qiangfu Zhao,et al.  Induction of compact neural network trees through centroid based dimensionality reduction , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[64]  Xian-Sheng Hua,et al.  A transductive multi-label learning approach for video concept detection , 2011, Pattern Recognit..

[65]  Jian Yang,et al.  A transductive framework of distance metric learning by spectral dimensionality reduction , 2007, ICML '07.

[66]  Feiping Nie,et al.  Learning a Mahalanobis distance metric for data clustering and classification , 2008, Pattern Recognit..

[67]  Meng Wang,et al.  Joint Learning of Labels and Distance Metric , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[68]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[69]  Jianguo Jiang,et al.  Automatic image annotation by semi-supervised manifold kernel density estimation , 2014, Inf. Sci..

[70]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[71]  Xuelong Li,et al.  Discriminative Orthogonal Neighborhood-Preserving Projections for Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[72]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[73]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[74]  Alexandros Kalousis,et al.  A New Framework for Dissimilarity and Similarity Learning , 2010, PAKDD.

[75]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..

[76]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[77]  Jun Yu,et al.  Pairwise constraints based multiview features fusion for scene classification , 2013, Pattern Recognit..

[78]  Meng Wang,et al.  Adaptive Hypergraph Learning and its Application in Image Classification , 2012, IEEE Transactions on Image Processing.

[79]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[80]  Wei Liu,et al.  Optimal semi-supervised metric learning for image retrieval , 2012, ACM Multimedia.