论文信息 - Probabilistic score estimation with piecewise logistic regression - 字舞流文

Probabilistic score estimation with piecewise logistic regression

Well-calibrated probabilities are necessary in many applications like probabilistic frameworks or cost-sensitive tasks. Based on previous success of asymmetric Laplace method in calibrating text classifiers' scores, we propose to use piecewise logistic regression, which is a simple extension of standard logistic regression, as an alternative method in the discriminative family. We show that both methods have the flexibility to be piecewise linear functions in log-odds, but they are based on quite different assumptions. We evaluated asymmetric Laplace method, piecewise logistic regression and standard logistic regression over standard text categorization collections (Reuters-21578 and TRECAP) with three classifiers (SVM, Naive Bayes and Logistic Regression Classifier), and observed that piecewise logistic regression performs significantly better than the other two methods in the log-loss metric.

Yiming Yang | Jian Zhang | Yiming Yang | Jian Zhang

[1] Trevor Hastie,et al. Generalized Additive Model , 2005 .

[2] Bianca Zadrozny,et al. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[3] Samuel Kotz,et al. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance , 2001 .

[4] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[5] Paul N. Bennett. Using asymmetric distributions to improve text classifier probability estimates , 2003, SIGIR.

[6] Pedro M. Domingos,et al. Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[7] Robert Tibshirani,et al. Classification by Pairwise Coupling , 1997, NIPS.

[8] R. Tibshirani,et al. Classi cation by Pairwise Coupling , 1998 .

[9] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[10] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[11] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[12] James P. Callan,et al. Training algorithms for linear text classifiers , 1996, SIGIR '96.

[13] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[14] David G. Stork,et al. Pattern Classification , 1973 .

[15] Bianca Zadrozny,et al. Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[16] Tong Zhang,et al. Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[17] Trevor J. Hastie,et al. Discriminative vs Informative Learning , 1997, KDD.

[18] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[19] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.