Combining Classification and User-Based Collaborative Filtering for Matching Footwear Size

Size mismatch is a serious problem in online footwear purchase because size mismatch implies an almost sure return. Not only foot measurements are important in selecting a size, but also user preference. This is the reason we propose several methodologies that combine the information provided by a classifier with anthropometric measurements and user preference information through user-based collaborative filtering. As novelties: (1) the information sources are 3D foot measurements from a low-cost 3D foot digitizer, past purchases and self-reported size; (2) we propose to use an ordinal classifier after imputing missing data with different options based on the use of collaborative filtering; (3) we also propose an ensemble of ordinal classification and collaborative filtering results; and (4) several methodologies based on clustering and archetype analysis are introduced as user-based collaborative filtering for the first time. The hybrid methodologies were tested in a simulation study, and they were also applied to a dataset of Spanish footwear users. The results show that combining the information from both sources predicts the foot size better and the new proposals provide better accuracy than the classic alternatives considered.

[1]  Yadong Wang,et al.  CDL4CDRP: A Collaborative Deep Learning Approach for Clinical Decision and Risk Prediction , 2019, Processes.

[2]  Sandra Alemany,et al.  An ensemble of ordered logistic regression and random forest for child garment size matching , 2016, Comput. Ind. Eng..

[3]  Amelia Simó,et al.  Archetypal Analysis With Missing Data: See All Samples by Looking at a Few Based on Extreme Profiles , 2020, The American Statistician.

[4]  C. Ji An Archetypal Analysis on , 2005 .

[5]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[6]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Debapriya Hazra,et al.  Toward Improving the Prediction Accuracy of Product Recommendation System Using Extreme Gradient Boosting and Encoding Approaches , 2020, Symmetry.

[9]  Yong Jiang,et al.  Guess your size: A hybrid model for footwear size recommendation , 2018, Adv. Eng. Informatics.

[10]  Irene Epifanio,et al.  h‐plots for displaying nonmetric dissimilarity matrices , 2013, Stat. Anal. Data Min..

[11]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[12]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[13]  Lars Kai Hansen,et al.  Archetypal analysis for machine learning , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[14]  Tetsuya Murai,et al.  Enhancing Recommendation Accuracy of Item-Based Collaborative Filtering via Item-Variance Weighting , 2019, Applied Sciences.

[15]  John K. Dixon,et al.  Pattern Recognition with Partly Missing Data , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[16]  Richard G. Baraniuk,et al.  k-POD: A Method for k-Means Clustering of Missing Data , 2014, 1411.7013.

[17]  Alfredo Ballester,et al.  Archetype analysis: A new subspace outlier detection approach , 2021, Knowl. Based Syst..

[18]  Irene Epifanio,et al.  Robust archetypoids for anomaly detection in big functional data , 2020, Adv. Data Anal. Classif..

[19]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[20]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[21]  Pedro Antonio Gutiérrez,et al.  Ordinal Regression Methods: Survey and Experimental Study , 2016, IEEE Transactions on Knowledge and Data Engineering.

[22]  Gerhard Tutz,et al.  Random forest for ordinal responses: Prediction and variable selection , 2016, Comput. Stat. Data Anal..

[23]  P. Bühlmann,et al.  Survival ensembles. , 2006, Biostatistics.

[24]  Rachael Hageman Blair,et al.  A comparative study: classification vs. user-based collaborative filtering for clinical prediction , 2016, BMC Medical Research Methodology.

[25]  Amelia Simó,et al.  A data-driven classification of 3D foot types by archetypal shapes based on landmarks , 2020, PloS one.