Complex Object Classification: A Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport

In real world applications, complex objects are usually with multiple labels, and can be represented as multiple modal representations, e.g., the complex articles contain text and image information as well as are with multiple annotations. Previous methods assume that the homogeneous multi-modal data are consistent, while in real applications, the raw data are disordered, i.e., the article is constituted with variable number of inconsistent text and image instances. To solve this problem, Multi-modal Multi-instance Multi-label (M3) learning provides a framework for handling such task and has exhibited excellent performance. Besides, how to effectively utilize label correlation is also a challenging issue. In this paper, we propose a novel Multi-modal Multi-instance Multi-label Deep Network (M3DN), which learns the label prediction and exploits label correlation simultaneously based on the Optimal Transport, by considering the consistency principle between different modal bag-level prediction and the learned latent ground label metric. Experiments on benchmark datasets and real world WKG Game-Hub dataset validate the effectiveness of the proposed method.

[1]  Zhi-Hua Zhou,et al.  Multi-Label Learning by Exploiting Label Correlations Locally , 2012, AAAI.

[2]  Zhi-Hua Zhou,et al.  Label Distribution Learning by Optimal Transport , 2018, AAAI.

[3]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[4]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[5]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[6]  Zhi-Hua Zhou,et al.  M3MIML: A Maximum Margin Method for Multi-instance Multi-label Learning , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  C. Tomasi The Earth Mover's Distance, Multi-Dimensional Scaling, and Color-Based Image Retrieval , 1997 .

[8]  Gabriel Peyré,et al.  Fast Dictionary Learning with a Smoothed Wasserstein Loss , 2016, AISTATS.

[9]  Jing Liu,et al.  Labeling Complicated Objects: Multi-View Multi-Instance Multi-Label Learning , 2014, AAAI.

[10]  Xuelong Li,et al.  Non-Negative Matrix Factorization with Sinkhorn Distance , 2016, IJCAI.

[11]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[12]  Zhi-Hua Zhou,et al.  Fast Multi-Instance Multi-Label Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[14]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[16]  Liang Wang,et al.  Unconstrained Multimodal Multi-Label Learning , 2015, IEEE Transactions on Multimedia.

[17]  Ji Feng,et al.  Deep MIML Network , 2017, AAAI.

[18]  Michael K. Ng,et al.  Transductive Multilabel Learning via Label Set Propagation , 2013, IEEE Transactions on Knowledge and Data Engineering.

[19]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[20]  Jingrui He,et al.  Learning from Label and Feature Heterogeneity , 2014, 2014 IEEE International Conference on Data Mining.

[21]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[22]  Zhongfei Zhang,et al.  Simultaneously Combining Multi-view Multi-label Learning with Maximum Margin Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[23]  Stephen Tyree,et al.  Non-linear Metric Learning , 2012, NIPS.

[24]  Wang Zhan,et al.  Inductive Semi-supervised Multi-Label Learning with Co-Training , 2017, KDD.

[25]  Xiaolin Li,et al.  College Student Scholarships and Subsidies Granting: A Multi-modal Multi-label Approach , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[26]  Jieping Ye,et al.  Jointly Modeling Label and Feature Heterogeneity in Medical Informatics , 2016, ACM Trans. Knowl. Discov. Data.

[27]  C. Villani Optimal Transport: Old and New , 2008 .

[28]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[29]  Zhi-Hua Zhou,et al.  Multi-Modal Image Annotation with Multi-Instance Multi-Label LDA , 2013, IJCAI.

[30]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[31]  James Zijun Wang,et al.  Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support , 2015, IEEE Transactions on Signal Processing.

[32]  David Avis,et al.  Ground metric learning , 2011, J. Mach. Learn. Res..

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[35]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[36]  Kate Saenko,et al.  Asymmetric and Category Invariant Feature Transformations for Domain Adaptation , 2014, International Journal of Computer Vision.

[37]  Jingrui He,et al.  Model Multiple Heterogeneity via Hierarchical Multi-Latent Space Learning , 2015, KDD.

[38]  James T. Kwok,et al.  Multilabel Classification with Label Correlations and Missing Labels , 2014, AAAI.

[39]  Hugo Jair Escalante,et al.  The segmented and annotated IAPR TC-12 benchmark , 2010, Comput. Vis. Image Underst..

[40]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.