Sinkhorn Regression

This paper introduces a novel Robust Regression (RR) model, named Sinkhorn regression, which imposes Sinkhorn distances on both loss function and regularization. Traditional RR methods target at searching for an element-wise loss function (e.g., Lp-norm) to characterize the errors such that outlying data have a relatively smaller influence on the regression estimator. Due to the neglect of the geometric information, they often lead to the sub-optimal results in the practical applications. To address this problem, we use a crossbin distance function, i.e., Sinkhorn distances, to capture the geometric knowledge from real data. Sinkhorn distances is invariant in movement, rotation and zoom. Thus, our method is more robust to variations of data than traditional regression models. Meanwhile, we leverage Kullback-Leibler divergence to relax the proposed model with marginal constraints into its unbalanced formulation to adapt more types of features. In addition, we propose an efficient algorithm to solve the relaxed model and establish its complete statistical guarantees under mild conditions. Experiments on the five publicly available microarray data sets and one mass spectrometry data set demonstrate the effectiveness and robustness of our method.

[1]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[3]  Dao-Qing Dai,et al.  Structured Sparse Error Coding for Face Recognition With Occlusion , 2013, IEEE Transactions on Image Processing.

[4]  BMC Bioinformatics , 2005 .

[5]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[6]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[7]  Gustavo K. Rohde,et al.  Transport-based single frame super resolution of very low resolution face images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[9]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[10]  Yi Ma,et al.  Robust and Practical Face Recognition via Structured Sparsity , 2012, ECCV.

[11]  Shuicheng Yan,et al.  Robust and Efficient Subspace Segmentation via Least Squares Regression , 2012, ECCV.

[12]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[13]  Jianzhong Li,et al.  A stable gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[14]  Jian Yang,et al.  Sparse Representation Classifier Steered Discriminative Projection With Applications to Face Recognition , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Jian Yang,et al.  Robust sparse coding for face recognition , 2011, CVPR 2011.

[16]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[17]  Michael Lindenbaum,et al.  Nonnegative Matrix Factorization with Earth Mover's Distance Metric for Image Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[19]  Gustavo K. Rohde,et al.  Accurate diagnosis of thyroid follicular lesions from nuclear morphology using supervised learning , 2014, Medical Image Anal..

[20]  Axel Munk,et al.  Optimal Transport: Fast Probabilistic Approximation with Exact Solvers , 2018, J. Mach. Learn. Res..

[21]  Ying Tai,et al.  Nuclear Norm Based Matrix Regression with Applications to Face Recognition with Occlusion and Illumination Changes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Bin Yu,et al.  Artificial intelligence and statistics , 2018, Frontiers of Information Technology & Electronic Engineering.

[23]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[24]  Jian Yang,et al.  Beyond sparsity: The role of L1-optimizer in pattern classification , 2012, Pattern Recognit..

[25]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[26]  Marco Cuturi,et al.  Wasserstein regularization for sparse multi-task regression , 2018, AISTATS.

[27]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[28]  Mohammed Bennamoun,et al.  Linear Regression for Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.