论文信息 - Superfast-Trainable Multi-Class Probabilistic Classifier by Least-Squares Posterior Fitting

Superfast-Trainable Multi-Class Probabilistic Classifier by Least-Squares Posterior Fitting

Kernel logistic regression (KLR) is a powerful and flexible classification algorithm, which possesses an ability to provide the confidence of class prediction. However, its training—typically carried out by (quasi-)Newton methods—is rather timeconsuming. In this paper, we propose an alternative probabilistic classification algorithm called Least-Squares Probabilistic Classifier (LSPC). KLR models the class-posterior probability by the log-linear combination of kernel functions and its parameters are learned by (regularized) maximum likelihood. In contrast, LSPC employs the linear combination of kernel functions and its parameters are learned by regularized least-squares fitting of the true class-posterior probability. Thanks to this linear regularized least-squares formulation, the solution of LSPC can be computed analytically just by solving a regularized system of linear equations in a class-wise manner. Thus LSPC is computationally very efficient and numerically stable. Through experiments, we show that the computation time of LSPC is faster than that of KLR by orders of magnitude, with comparable classification accuracy.

Masashi Sugiyama | Masashi Sugiyama

[1] Chih-Jen Lin,et al. Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[2] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[3] Samy Bengio,et al. SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[4] E. Newport,et al. Science Current Directions in Psychological Statistical Learning : from Acquiring Specific Items to Forming General Rules on Behalf Of: Association for Psychological Science , 2022 .

[5] Sören Sonnenburg,et al. Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization , 2009, J. Mach. Learn. Res..

[6] Masashi Sugiyama,et al. Conic Programming for Multitask Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[8] Jing Peng,et al. SVM vs regularized least squares classification , 2004, ICPR 2004.

[9] Takafumi Kanamori,et al. Theoretical Analysis of Density Ratio Estimation , 2010, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[10] P. Bartlett,et al. Probabilities for SV Machines , 2000 .

[11] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.