Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis

In image processing, cartoon character classification, retrieval, and synthesis are critical, so that cartoonists can effectively and efficiently make cartoons by reusing existing cartoon data. To successfully achieve these tasks, it is essential to extract visual features that comprehensively represent cartoon characters and to construct an accurate distance metric to precisely measure the dissimilarities between cartoon characters. In this paper, we introduce three visual features, color histogram, shape context, and skeleton, to characterize the color, shape, and action, respectively, of a cartoon character. These three features are complementary to each other, and each feature set is regarded as a single view. However, it is improper to concatenate these three features into a long vector, because they have different physical properties, and simply concatenating them into a high-dimensional feature vector will suffer from the so-called curse of dimensionality. Hence, we propose a semisupervised multiview distance metric learning (SSM-DML). SSM-DML learns the multiview distance metrics from multiple feature sets and from the labels of unlabeled cartoon characters simultaneously, under the umbrella of graph-based semisupervised learning. SSM-DML discovers complementary characteristics of different feature sets through an alternating optimization-based iterative algorithm. Therefore, SSM-DML can simultaneously accomplish cartoon character classification and dissimilarity measurement. On the basis of SSM-DML, we develop a novel system that composes the modules of multiview cartoon character classification, multiview graph-based cartoon synthesis, and multiview retrieval-based cartoon synthesis. Experimental evaluations based on the three modules suggest the effectiveness of SSM-DML in cartoon applications.

[1]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[2]  Jean-Daniel Fekete,et al.  TicTacToon: a paperless system for professional 2D animation , 1995, SIGGRAPH.

[3]  Meng-Han Tsai,et al.  Elastic body spline technique for feature point generation and face modeling , 2005, IEEE Transactions on Image Processing.

[4]  Yi Yang,et al.  Retrieval based interactive cartoon synthesis via unsupervised bi-distance metric learning , 2009, ACM Multimedia.

[5]  Kaizhu Huang,et al.  m-SNE: Multiview Stochastic Neighbor Embedding , 2011, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[7]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[8]  Xuelong Li,et al.  General Tensor Discriminant Analysis and Gabor Features for Gait Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Samuel Kaski,et al.  Learning More Accurate Metrics for Self-Organizing Maps , 2002, ICANN.

[10]  Zhigang Luo,et al.  Online Nonnegative Matrix Factorization With Robust Stochastic Approximation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[12]  Jun Yu,et al.  Complex Object Correspondence Construction in Two-Dimensional Animation , 2011, IEEE Transactions on Image Processing.

[13]  Zhigang Luo,et al.  NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization , 2012, IEEE Transactions on Signal Processing.

[14]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[15]  Bobby Bodenheimer,et al.  Cartoon textures , 2004, SCA '04.

[16]  Benjamin B. Kimia,et al.  Curves vs. skeletons in object recognition , 2005, Signal Process..

[17]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Wei Liu,et al.  Semi-supervised distance metric learning for collaborative image retrieval and clustering , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[20]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[21]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[22]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Quan Chen,et al.  DBSC‐based animation enhanced with feature and motion , 2006, Comput. Animat. Virtual Worlds.

[24]  Maneesh Agrawala,et al.  The cartoon animation filter , 2006, ACM Trans. Graph..

[25]  Suh-Yin Lee,et al.  Automatic Cel Painting in Computer-assisted Cartoon Production using Similarity Recognition , 1997, Comput. Animat. Virtual Worlds.

[26]  Hassan Foroosh,et al.  View-Invariant Action Recognition from Point Triplets , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Longin Jan Latecki,et al.  Path Similarity Skeleton Graph Matching , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jiayan Jiang,et al.  Learning a mixture of sparse distance metrics for classification and dimensionality reduction , 2011, 2011 International Conference on Computer Vision.

[29]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[30]  Stéphane Canu,et al.  A Multi-kernel Framework for Inductive Semi-supervised Learning , 2011, ESANN.

[31]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[32]  Qi Tian,et al.  S3MKL: scalable semi-supervised multiple kernel learning for image data mining , 2010, ACM Multimedia.

[33]  James C. Bezdek,et al.  Some Notes on Alternating Optimization , 2002, AFSS.

[34]  Alexander Kort,et al.  Computer aided inbetweening , 2002, NPAR '02.

[35]  Kai-Kuang Ma,et al.  Fuzzy color histogram and its use in color image retrieval , 2002, IEEE Trans. Image Process..

[36]  Bernhard Schölkopf,et al.  Fast protein classification with multiple networks , 2005, ECCB/JBI.

[37]  Wenyu Liu,et al.  Skeleton Pruning by Contour Partitioning with Discrete Curve Evolution , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Jun Yu,et al.  On Combining Multiple Features for Cartoon Character Retrieval and Clip Synthesis , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[39]  Hassan Foroosh,et al.  Action recognition using rank-1 approximation of Joint Self-Similarity Volume , 2011, 2011 International Conference on Computer Vision.

[40]  Ronen Basri,et al.  Determining the similarity of deformable shapes , 1998, Vision Research.

[41]  Meng Wang,et al.  Adaptive Hypergraph Learning and its Application in Image Classification , 2012, IEEE Transactions on Image Processing.

[42]  Dacheng Tao,et al.  Sparse transfer learning for interactive video search reranking , 2012, TOMCCAP.

[43]  Hassan Foroosh,et al.  View-invariant recognition of body pose from space-time templates , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[45]  Ioannis Patras,et al.  Combining color and shape information for illumination-viewpoint invariant object recognition , 2006, IEEE Transactions on Image Processing.

[46]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[47]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.