Exchangeable Deep Neural Networks for Set-to-Set Matching and Learning

Matching two different sets of items, called heterogeneous set-to-set matching problem, has recently received attention as a promising problem. The difficulties are to extract features to match a correct pair of different sets and also preserve two types of exchangeability required for set-to-set matching: the pair of sets, as well as the items in each set, should be exchangeable. In this study, we propose a novel deep learning architecture to address the abovementioned difficulties and also an efficient training framework for set-to-set matching. We evaluate the methods through experiments based on two industrial applications: fashion set recommendation and group re-identification. In these experiments, we show that the proposed method provides significant improvements and results compared with the state-of-the-art methods, thereby validating our architecture for the heterogeneous set matching problem.

[1]  Michael A. Osborne,et al.  On the Limitations of Representing Functions on Sets , 2019, ICML.

[2]  Duc-Trong Le,et al.  Correlation-Sensitive Next-Basket Recommendation , 2019, IJCAI.

[3]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[5]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Li Bai,et al.  Cosine Similarity Metric Learning for Face Verification , 2010, ACCV.

[8]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ziling Huang,et al.  Group Re-Identification via Transferred Representation and Adaptive Fusion , 2019, 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM).

[10]  Dmitry Yarotsky,et al.  Universal Approximations of Invariant Maps by Neural Networks , 2018, Constructive Approximation.

[11]  Tomoharu Iwata,et al.  Unsupervised Many-to-Many Object Matching for Relational Data , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Matti Pietikäinen,et al.  Matching Groups of People by Covariance Descriptor , 2010, 2010 20th International Conference on Pattern Recognition.

[13]  Denis Simakov,et al.  Feature-Based Sequence-to-Sequence Matching , 2006, International Journal of Computer Vision.

[14]  Tao Mei,et al.  Group Reidentification with Multigrained Matching and Integration , 2019, IEEE Transactions on Cybernetics.

[15]  Yu Liu,et al.  Quality Aware Network for Set to Set Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ke Lu,et al.  Group Re-Identification: Leveraging and Integrating Multi-Grain Information , 2018, ACM Multimedia.

[17]  Ajmal S. Mian,et al.  Sparse approximated nearest points for image set classification , 2011, CVPR 2011.

[18]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[19]  Cristian Sminchisescu,et al.  Deep Learning of Graph Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Yizhou Sun,et al.  SimGNN: A Neural Network Approach to Fast Graph Similarity Computation , 2018, WSDM.

[21]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[22]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[23]  Nils M. Kriege,et al.  Deep Graph Matching Consensus , 2020, ICLR.

[24]  Li Fei-Fei,et al.  MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels , 2017, ArXiv.

[25]  Gang Wang,et al.  Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[27]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[28]  Lei Zhang,et al.  Face recognition based on regularized nearest points between image sets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[29]  Nanning Zheng,et al.  Point to Set Similarity Based Deep Feature Learning for Person Re-Identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Trevor Darrell,et al.  Face recognition with image sets using manifold density divergence , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[32]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[33]  Yu-Gang Jiang,et al.  Learning Fashion Compatibility with Bidirectional LSTMs , 2017, ACM Multimedia.

[34]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Tomoharu Iwata,et al.  Unsupervised group matching with application to cross-lingual topic matching without alignment information , 2017, Data Mining and Knowledge Discovery.

[37]  Yizhou Sun,et al.  Convolutional Set Matching for Graph Similarity , 2018, ArXiv.

[38]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[40]  Li Shen,et al.  Comparator Networks , 2018, ECCV.

[41]  Julian J. McAuley,et al.  Learning Compatibility Across Categories for Heterogeneous Item Recommendation , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[42]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[43]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[44]  Jiebo Luo,et al.  Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set Data , 2016, IEEE Transactions on Multimedia.

[45]  Ken-ichi Maeda,et al.  Face recognition using temporal image sequence , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[46]  Li Fei-Fei,et al.  Neural Graph Matching Networks for Fewshot 3D Action Recognition , 2018, ECCV.

[47]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[48]  Luc Van Gool,et al.  Building Deep Networks on Grassmann Manifolds , 2016, AAAI.

[49]  Akiyoshi Sannai,et al.  Universal approximations of permutation invariant/equivariant functions by deep neural networks , 2019, ArXiv.

[50]  Kristen Grauman,et al.  Creating Capsule Wardrobes from Fashion Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Wei Hu,et al.  DoT-GNN: Domain-Transferred Graph Neural Network for Group Re-identification , 2019, ACM Multimedia.

[52]  Pushmeet Kohli,et al.  Graph Matching Networks for Learning the Similarity of Graph Structured Objects , 2019, ICML.

[53]  Theodoros Rekatsinas,et al.  Deep Learning for Entity Matching: A Design Space Exploration , 2018, SIGMOD Conference.

[54]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[55]  Shih-Fu Chang,et al.  Deep Image Set Hashing , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[56]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[57]  David Vázquez,et al.  Context-Aware Visual Compatibility Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[59]  Shaogang Gong,et al.  Associating Groups of People , 2009, BMVC.

[60]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[61]  Nenghai Yu,et al.  Consistent matching based on boosted salience channels for group re-identification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[62]  Ichiro Takeuchi,et al.  Learning Interpretable Metric between Graphs: Convex Formulation and Computation with Graph Mining , 2019, KDD.

[63]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[64]  Yaron Lipman,et al.  On Universal Equivariant Set Networks , 2020, ICLR.

[65]  Alberto Del Bimbo,et al.  Group Re-identification via Unsupervised Transfer of Sparse Features Encoding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[66]  Kenji Fukumizu,et al.  Data‐driven taxonomy matching of asteroid and meteorite , 2020, Meteoritics & Planetary Science.

[67]  Hakan Cevikalp,et al.  Face recognition based on image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[69]  Deyu Wang,et al.  Group-Pair Convolutional Neural Networks for Multi-View Based 3D Object Retrieval , 2018, AAAI.

[70]  Saehoon Kim,et al.  Practical Bayesian Optimization over Sets , 2019 .

[71]  Zhiming Zhang,et al.  Exploring Inter-Instance Relationships within the Query Set for Robust Image Set Matching , 2019, Sensors.

[72]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[73]  Ryosuke Goto,et al.  Outfit Generation and Style Extraction via Bidirectional LSTM and Autoencoder , 2018, ArXiv.

[74]  Wen Gao,et al.  Manifold-Manifold Distance with application to face recognition based on image set , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  Gang Wang,et al.  Multi-manifold deep metric learning for image set classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[77]  Rui Yu,et al.  Hard-Aware Point-to-Set Deep Metric for Person Re-identification , 2018, ECCV.

[78]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[79]  Shin'ichi Satoh,et al.  Group Re-identification via Transferred Single and Couple Representation Learning , 2019, ArXiv.

[80]  Kazuhiro Fukui,et al.  A Method Based on Convex Cone Model for Image-Set Classification With CNN Features , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[81]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[82]  Trevor Darrell,et al.  Face Recognition from Long-Term Observations , 2002, ECCV.

[83]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[84]  Zhenhua Guo,et al.  Permutation-Invariant Feature Restructuring for Correlation-Aware Image Set-Based Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[85]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Yaron Lipman,et al.  Invariant and Equivariant Graph Networks , 2018, ICLR.

[87]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[88]  David A. Forsyth,et al.  Learning Type-Aware Embeddings for Fashion Compatibility , 2018, ECCV.

[89]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Pascal Vincent,et al.  K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms , 2001, NIPS.

[91]  Department of Electrical,et al.  Computational and Performance Aspects of PCA-Based Face-Recognition Algorithms , 2001, Perception.

[92]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[93]  David Zhang,et al.  From Point to Set: Extend the Learning of Distance Metrics , 2013, 2013 IEEE International Conference on Computer Vision.