Inferring the Importance of Product Appearance with Semi-supervised Multi-modal Enhancement: A Step Towards the Screenless Retailing

Nowadays, almost all the online orders were placed through screened devices such as mobile phones, tablets, and computers. With the rapid development of the Internet of Things (IoT) and smart appliances, more and more screenless smart devices, e.g., smart speaker and smart refrigerator, appear in our daily lives. They open up new means of interaction and may provide an excellent opportunity to reach new customers and increase sales. However, not all the items are suitable for screenless shopping, since some items' appearance play an important role in consumer decision making. Typical examples include clothes, dolls, bags, and shoes. In this paper, we aim to infer the significance of every item's appearance in consumer decision making and identify the group of items that are suitable for screenless shopping. Specifically, we formulate the problem as a classification task that predicts if an item's appearance has a significant impact on people's purchase behavior. To solve this problem, we extract multi-modal features from three different views, and collect a set of necessary labels via crowdsourcing. We then propose an iterative semi-supervised learning framework with a carefully designed multi-modal enhancement module. Experimental results verify the effectiveness of the proposed method.

[1]  Dinggang Shen,et al.  Late Fusion Incomplete Multi-View Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[3]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[5]  Herman Wold,et al.  Soft modelling: The Basic Design and Some Extensions , 1982 .

[6]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[7]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[8]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Yanjun Qi,et al.  Learning to rank with (a lot of) word features , 2010, Information Retrieval.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Zhi-Hua Zhou,et al.  Tri-net for Semi-Supervised Deep Learning , 2018, IJCAI.

[12]  Zhi-Hua Zhou,et al.  Enhancing relevance feedback in image retrieval using unlabeled data , 2006, ACM Trans. Inf. Syst..

[13]  Lei Zheng,et al.  Semi-supervised Deep Representation Learning for Multi-View Problems , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[16]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[17]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[18]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[19]  Zhi-Hua Zhou,et al.  Semi-Supervised Regression with Co-Training , 2005, IJCAI.

[20]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[21]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[22]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[23]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[24]  Dacheng Tao,et al.  Multi-View Learning With Incomplete Views , 2015, IEEE Transactions on Image Processing.

[25]  Shao-Yuan Li,et al.  Partial Multi-View Clustering , 2014, AAAI.

[26]  Kire Trivodaliev,et al.  A review of Internet of Things for smart home: Challenges and solutions , 2017 .

[27]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[28]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[29]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[30]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[31]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[32]  Jinfeng Yi,et al.  Efficient distance metric learning by adaptive sampling and mini-batch stochastic gradient descent (SGD) , 2013, Machine Learning.

[33]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[34]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[35]  Larry S. Davis,et al.  Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[37]  Mikhail Belkin,et al.  A Co-Regularization Approach to Semi-supervised Learning with Multiple Views , 2005 .

[38]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[39]  Xiao Liu,et al.  Co-Regularized Deep Multi-Network Embedding , 2018, WWW.

[40]  Yuan Jiang,et al.  Semi-Supervised Multi-Modal Learning with Incomplete Modalities , 2018, IJCAI.

[41]  Lei Wang,et al.  Multiple Kernel k-Means with Incomplete Kernels , 2017, AAAI.

[42]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[44]  Shotaro Akaho,et al.  A kernel method for canonical correlation analysis , 2006, ArXiv.

[45]  Zhi-Hua Zhou,et al.  Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[46]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[47]  Feiping Nie,et al.  Scalable Multi-View Semi-Supervised Classification via Adaptive Regression , 2017, IEEE Transactions on Image Processing.

[48]  Feiping Nie,et al.  Fast Multi-View Semi-Supervised Learning With Learned Graph , 2022, IEEE Transactions on Knowledge and Data Engineering.

[49]  Liang Wang,et al.  Incomplete Multi-view Clustering via Subspace Learning , 2015, CIKM.

[50]  Xuelong Li,et al.  Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours , 2017, AAAI.

[51]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[52]  Yonghong Kuang,et al.  Smart home energy management systems: Concept, configurations, and scheduling strategies , 2016 .