Multi-taskmulti-labelmultiple instance learning

For automatic object detection tasks, large amounts of training images are usually labeled to achieve more reliable training of the object classifiers; this is cost-expensive since it requires hiring professionals to label large-scale training images. When a large number of object classes come into view, the issue of obtaining a large enough amount of the labeled training images becomes more critical. There are three potential solutions to reduce the burden for image labeling: (1) allowing people to provide the object labels loosely at the image level rather than at the object level (e.g., loosely-tagged images without identifying the exact object locations in the images); (2) harnessing large-scale collaboratively-tagged images that are available on the Internet; and, (3) developing new machine learning algorithms that can directly leverage large-scale collaboratively- or loosely-tagged images for achieving more effective training of a large number of object classifiers. Based on these observations, a multi-task multi-label multiple instance learning (MTML-MIL) algorithm is developed in this paper by leveraging both interobject correlations and large-scale loosely-labeled images for object classifier training. By seamlessly integrating multi-task learning, multi-label learning, and multiple instance learning, our MTML-MIL algorithm can achieve more accurate training of a large number of inter-related object classifiers (where an object network is constructed for determining the inter-related learning tasks directly in the feature space rather than in the label space). Our experimental results have shown that our MTML-MIL algorithm can achieve higher detection accuracy rates for automatic object detection.

[1]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[2]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Meng Wang,et al.  Correlative Linear Neighborhood Propagation for Video Annotation , 2009, IEEE Trans. Syst. Man Cybern. Part B.

[4]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[5]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Jianping Fan,et al.  Multi-level annotation of natural scenes using dominant image components and semantic concepts , 2004, MULTIMEDIA '04.

[7]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[8]  Jianping Fan,et al.  Mining Multilevel Image Semantics via Hierarchical Classification , 2008, IEEE Transactions on Multimedia.

[9]  B. S. Manjunath,et al.  Color image segmentation , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[10]  Jianping Fan,et al.  Incorporating Concept Ontology for Hierarchical Video Classification, Annotation, and Visualization , 2007, IEEE Transactions on Multimedia.

[11]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[12]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[13]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[14]  Jianping Fan,et al.  Harvesting large-scale weakly-tagged image databases from the web , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Dominik Brugger,et al.  Parallel Support Vector Machines , 2006 .

[16]  Qi Zhang,et al.  Content-Based Image Retrieval Using Multiple-Instance Learning , 2002, ICML.

[17]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[18]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[19]  Shih-Fu Chang,et al.  Context-Based Concept Fusion with Boosted Conditional Random Fields , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[20]  Jianping Fan,et al.  Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation , 2008, IEEE Transactions on Image Processing.

[21]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[22]  Martial Hebert,et al.  Discriminative Random Fields , 2006, International Journal of Computer Vision.

[23]  Wei-Ying Ma,et al.  An adaptive graph model for automatic image annotation , 2006, MIR '06.

[24]  Kristen Grauman,et al.  Keywords to visual categories: Multiple-instance learning forweakly supervised object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[27]  Eric P. Xing,et al.  Harmonium Models for Semantic Video Representation and Classification , 2007, SDM.

[28]  Zhi-Hua Zhou,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2006, NIPS.

[29]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..