A Discriminative Learning Framework with Pairwise Constraints for Video Object Classification

To deal with the problem of insufficient labeled data in video object classification, one solution is to utilize additional pairwise constraints that indicate the relationship between two examples, i.e., whether these examples belong to the same class or not. In this paper, we propose a discriminative learning approach which can incorporate pairwise constraints into a conventional margin-based learning framework. Different from previous work that usually attempts to learn better distance metrics or estimate the underlying data distribution, the proposed approach can directly model the decision boundary and, thus, require fewer model assumptions. Moreover, the proposed approach can handle both labeled data and pairwise constraints in a unified framework. In this work, we investigate two families of pairwise loss functions, namely, convex and nonconvex pairwise loss functions, and then derive three pairwise learning algorithms by plugging in the hinge loss and the logistic loss functions. The proposed learning algorithms were evaluated using a people identification task on two surveillance video data sets. The experiments demonstrated that the proposed pairwise learning algorithms considerably outperform the baseline classifiers using only labeled data and two other pairwise learning algorithms with the same amount of pairwise constraints.

[1]  Joachim M. Buhmann,et al.  Learning with constrained and unlabelled data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Timothy D. Ross,et al.  Atomatic target recognition (ATR) evaluation theory: a survey , 2000, SPIE Defense + Commercial Sensing.

[3]  H. Gish,et al.  Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.

[4]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[5]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[6]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[8]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Daphna Weinshall,et al.  Enhancing image and video retrieval: learning via equivalence constraints , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  Ramesh C. Jain,et al.  A survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video , 2002, Pattern Recognit..

[11]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[15]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[17]  D. Weinshall,et al.  Computing Gaussian Mixture Models with EM using Side-Information , 2003 .

[18]  Lexing Xie,et al.  Slightly Supervised Learning of Part-Based Appearance Models , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[19]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[20]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[21]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[22]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[23]  Anil K. Jain,et al.  Feature extraction methods for character recognition-A survey , 1996, Pattern Recognit..

[24]  Shaogang Gong,et al.  Constructing Facial Identity Surfaces for Recognition , 2003, International Journal of Computer Vision.

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  Thomas S. Huang,et al.  Face detection with information-based maximum discrimination , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[28]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[29]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[30]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[31]  Trevor Darrell,et al.  Integrated face and gait recognition from multiple views , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[32]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[33]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[34]  Jianbo Shi,et al.  Grouping with Directed Relationships , 2001, EMMCVPR.

[35]  Thomas F. Coleman,et al.  An Interior Trust Region Approach for Nonlinear Minimization Subject to Bounds , 1993, SIAM J. Optim..

[36]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.