T-logistic Regression

We extend logistic regression by using t-exponential families which were introduced recently in statistical physics. This gives rise to a regularized risk minimization problem with a non-convex loss function. An efficient block coordinate descent optimization scheme can be derived for estimating the parameters. Because of the nature of the loss function, our algorithm is tolerant to label noise. Furthermore, unlike other algorithms which employ non-convex loss functions, our algorithm is fairly robust to the choice of initial values. We verify both these observations empirically on a number of synthetic and real datasets.

[1]  A. Zellner Bayesian and Non-Bayesian Analysis of the Regression Model with Multivariate Student- t Error Terms , 1976 .

[2]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[3]  A. O'Hagan,et al.  On Outlier Rejection Phenomena in Bayes Inference , 1979 .

[4]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[5]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[6]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[7]  Hiroshi Konno,et al.  An outer approximation method for minimizing the product of several convex functions on a convex set , 1993, J. Glob. Optim..

[8]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[9]  C. Tsallis,et al.  Student's t- and r-distributions: Unified derivation from an entropic variational principle , 1997 .

[10]  C. Tsallis,et al.  The role of constraints within generalized nonextensive statistics , 1998 .

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[13]  J. Rosenthal A First Look at Rigorous Probability Theory , 2000 .

[14]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[16]  J. Naudts Deformed exponentials and logarithms in generalized thermostatistics , 2002, cond-mat/0203489.

[17]  Shai Ben-David,et al.  On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[18]  N. Deo Journal of Inequalities in Pure and Applied Mathematics , 2004 .

[19]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[20]  Estimators, escort probabilities, and phi-exponential families in statistical physics , 2004, math-ph/0402005.

[21]  J. Naudts Generalized thermostatistics based on deformed exponential and logarithmic functions , 2003, cond-mat/0311438.

[22]  J. Naudts Generalized thermostatistics and mean-field theory , 2002, cond-mat/0211444.

[23]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[24]  J. Stutz,et al.  Generalized Maximum Entropy , 2005 .

[25]  Chuanhai Liu Robit Regression: A Simple Robust Alternative to Logistic and Probit Regression , 2005 .

[26]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[27]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[28]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[29]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[30]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[31]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[32]  C. Shalizi Maximum Likelihood Estimation for q-Exponential (Tsallis) Distributions , 2007, math/0701854.

[33]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[34]  Sören Sonnenburg,et al.  Optimized cutting plane algorithm for support vector machines , 2008, ICML '08.

[35]  Rocco A. Servedio,et al.  Random classification noise defeats all convex potential boosters , 2008, ICML.

[36]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[37]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[38]  Aki Vehtari,et al.  Gaussian process regression with Student-t likelihood , 2009, NIPS.

[39]  Yoav Freund,et al.  A more robust boosting algorithm , 2009, 0905.2138.

[40]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[41]  Nuno Vasconcelos,et al.  On the design of robust classifiers for computer vision , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Timothy D. Sears Generalized Maximum Entropy, Convexity and Machine Learning , 2010 .

[43]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[44]  Xinhua Zhang,et al.  Smoothing multivariate performance measures , 2011, J. Mach. Learn. Res..