Semi-Supervised Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport

Complex objects are usually with multiple labels, and can be represented by multiple modal representations, e.g., the complex articles contain text and image information as well as multiple annotations. Previous methods assume that the homogeneous multi-modal data are consistent, while in real applications, the raw data are disordered, e.g., the article constitutes with variable number of inconsistent text and image instances. Therefore, Multi-modal Multi-instance Multi-label (M3) learning provides a framework for handling such task and has exhibited excellent performance. However, M3 learning is facing two main challenges: 1) how to effectively utilize label correlation and 2) how to take advantage of multi-modal learning to process unlabeled instances. To solve these problems, we first propose a novel Multi-modal Multi-instance Multi-label Deep Network (M3DN), which considers M3 learning in an end-to-end multi-modal deep network and utilizes consistency principle among different modal bag-level predictions. Based on the M3DN, we learn the latent ground label metric with the optimal transport. Moreover, we introduce the extrinsic unlabeled multi-modal multi-instance data, and propose the M3DNS, which considers the instance-level auto-encoder for single modality and modified bag-level optimal transport to strengthen the consistency among modalities. Thereby M3DNS can better predict label and exploit label correlation simultaneously. Experiments on benchmark datasets and real world WKG Game-Hub dataset validate the effectiveness of the proposed methods.

[1]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[2]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[3]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[4]  Zhi-Hua Zhou,et al.  Multi-Label Learning by Exploiting Label Correlations Locally , 2012, AAAI.

[5]  Gershon Wolansky,et al.  Optimal Transport , 2021 .

[6]  C. Villani Optimal Transport: Old and New , 2008 .

[7]  Zhi-Hua Zhou,et al.  Fast Multi-Instance Multi-Label Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jingrui He,et al.  Model Multiple Heterogeneity via Hierarchical Multi-Latent Space Learning , 2015, KDD.

[9]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[10]  Wang Zhan,et al.  Inductive Semi-supervised Multi-Label Learning with Co-Training , 2017, KDD.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[13]  Jing Liu,et al.  Labeling Complicated Objects: Multi-View Multi-Instance Multi-Label Learning , 2014, AAAI.

[14]  Shao-Yuan Li,et al.  Partial Multi-View Clustering , 2014, AAAI.

[15]  James T. Kwok,et al.  Multilabel Classification with Label Correlations and Missing Labels , 2014, AAAI.

[16]  C. Tomasi The Earth Mover's Distance, Multi-Dimensional Scaling, and Color-Based Image Retrieval , 1997 .

[17]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[18]  Kate Saenko,et al.  Asymmetric and Category Invariant Feature Transformations for Domain Adaptation , 2014, International Journal of Computer Vision.

[19]  Xuelong Li,et al.  Non-Negative Matrix Factorization with Sinkhorn Distance , 2016, IJCAI.

[20]  Le Wu,et al.  Multi-Label Classification with Unlabeled Data: An Inductive Approach , 2013, ACML.

[21]  Liang Wang,et al.  Unconstrained Multimodal Multi-Label Learning , 2015, IEEE Transactions on Multimedia.

[22]  David Avis,et al.  Ground metric learning , 2011, J. Mach. Learn. Res..

[23]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[24]  Xiaolin Li,et al.  College Student Scholarships and Subsidies Granting: A Multi-modal Multi-label Approach , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[25]  Zhongfei Zhang,et al.  Simultaneously Combining Multi-view Multi-label Learning with Maximum Margin Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[26]  Stephen Tyree,et al.  Non-linear Metric Learning , 2012, NIPS.

[27]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[28]  Gabriel Peyré,et al.  Fast Dictionary Learning with a Smoothed Wasserstein Loss , 2016, AISTATS.

[29]  Xin Geng,et al.  Binary relevance for multi-label learning: an overview , 2018, Frontiers of Computer Science.

[30]  Jingrui He,et al.  Learning from Label and Feature Heterogeneity , 2014, 2014 IEEE International Conference on Data Mining.

[31]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[32]  Zhi-Hua Zhou,et al.  Multi-Modal Image Annotation with Multi-Instance Multi-Label LDA , 2013, IJCAI.

[33]  Yuan Jiang,et al.  Complex Object Classification: A Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport , 2018, KDD.

[34]  Hugo Jair Escalante,et al.  The segmented and annotated IAPR TC-12 benchmark , 2010, Comput. Vis. Image Underst..

[35]  Ji Feng,et al.  Deep MIML Network , 2017, AAAI.

[36]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[37]  Zhi-Hua Zhou,et al.  M3MIML: A Maximum Margin Method for Multi-instance Multi-label Learning , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[38]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[39]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[40]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[41]  Zhi-Hua Zhou,et al.  Label Distribution Learning by Optimal Transport , 2018, AAAI.

[42]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Michael K. Ng,et al.  Transductive Multilabel Learning via Label Set Propagation , 2013, IEEE Transactions on Knowledge and Data Engineering.

[44]  Jieping Ye,et al.  Jointly Modeling Label and Feature Heterogeneity in Medical Informatics , 2016, ACM Trans. Knowl. Discov. Data.

[45]  FrankEibe,et al.  Classifier chains for multi-label classification , 2011 .