Robust and discriminative distance for Multi-Instance Learning

Multi-Instance Learning (MIL) is an emerging topic in machine learning, which has broad applications in computer vision. For example, by considering video classification as a MIL problem where we only need labeled video clips (such as tagged online videos) but not labeled video frames, one can lower down the labeling cost, which is typically very expensive. We propose a novel class specific distance Metrics enhanced Class-to-Bag distance (M-C2B) method to learn a robust and discriminative distance for multi-instance data, which employs the not-squared ℓ2-norm distance to address the most difficult challenge in MIL, i.e., the outlier instances that abound in multi-instance data by nature. As a result, the formulated objective ends up to be a simultaneous ℓ2, 1-norm minimization and maximization (minmax) problem, which is very hard to solve in general due to the non-smoothness of the ℓ2, 1-norm. We thus present an efficient iterative algorithm to solve the general ℓ2, 1-norm minmax problem with rigorously proved convergence. To the best of our knowledge, we are the first to solve a general ℓ2, 1-norm minmax problem in literature. We have conducted extensive experiments to evaluate various aspects of the proposed method, in which promising results validate our new method in cost-effective video classification.

[1]  Feiping Nie,et al.  Learning Instance Specific Distance for Multi-Instance Classification , 2011, AAAI.

[2]  Feiping Nie,et al.  Maximum Margin Multi-Instance Learning , 2011, NIPS.

[3]  Zhi-Hua Zhou,et al.  Multi-instance learning by treating instances as non-I.I.D. samples , 2008, ICML '09.

[4]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[5]  Chris H. Q. Ding,et al.  Multi-label Linear Discriminant Analysis , 2010, ECCV.

[6]  Chris H. Q. Ding,et al.  Multi-label Feature Transform for Image Classifications , 2010, ECCV.

[7]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[8]  Chris H. Q. Ding,et al.  Image annotation using multi-label correlated Green's function , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[10]  Yan Liu,et al.  Supervised manifold learning for image and video classification , 2010, ACM Multimedia.

[11]  Yi Yang,et al.  Learning frame relevance for video classification , 2011, ACM Multimedia.

[12]  Paul Over,et al.  TRECVID: evaluating the effectiveness of information retrieval tasks on digital video , 2004, MULTIMEDIA '04.

[13]  Zhi-Hua Zhou,et al.  Learning a distance metric from multi-instance multi-label data , 2009, CVPR.

[14]  Peng Liu,et al.  Semi-supervised sparse metric learning using alternating linearization optimization , 2010, KDD.

[15]  Paul Over,et al.  TRECVID: Benchmarking the Effectivenss of Information Retrieval Tasks on Digital Video , 2003, CIVR.

[16]  Cordelia Schmid,et al.  Multiple Instance Metric Learning from Automatically Labeled Bags of Faces , 2010, ECCV.

[17]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[18]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[19]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.