Density-based logistic regression

This paper introduces a nonlinear logistic regression model for classification. The main idea is to map the data to a feature space based on kernel density estimation. A discriminative model is then learned to optimize the feature weights as well as the bandwidth of a Nadaraya-Watson kernel density estimator. We then propose a hierarchical optimization algorithm for learning the coefficients and kernel bandwidths in an integrated way. Compared to other nonlinear models such as kernel logistic regression (KLR) and SVM, our approach is far more efficient since it solves an optimization problem with a much smaller size. Two other major advantages are that it can cope with categorical attributes in a unified fashion and naturally handle multi-class problems. Moveover, our approach inherits from logistic regression good interpretability of the model, which is important for clinical applications but not offered by KLR and SVM. Extensive results on real datasets, including a clinical prediction application currently under deployment in a major hospital, show that our approach not only achieves superior classification accuracy, but also drastically reduces the computing time as compared to other leading methods.

[1]  Yixin Chen,et al.  Early Deterioration Warning for Hospitalized Patients by Mining Clinical Data , 2011, Int. J. Knowl. Discov. Bioinform..

[2]  S. Sathiya Keerthi,et al.  A Fast Dual Algorithm for Kernel Logistic Regression , 2002, 2007 International Joint Conference on Neural Networks.

[3]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[4]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[5]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  Xihong Wu,et al.  Text Segmentation with LDA-Based Fisher Kernel , 2008, ACL.

[8]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[9]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[10]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[11]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[12]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[13]  Yixin Chen,et al.  An integrated data mining approach to real-time clinical monitoring and deterioration warning , 2012, KDD.

[14]  Gunnar Rätsch,et al.  A New Discriminative Kernel from Probabilistic Models , 2001, Neural Computation.

[15]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[16]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[17]  Subhransu Maji,et al.  Max-margin additive classifiers for detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[19]  Rajat Raina,et al.  Classification with Hybrid Generative/Discriminative Models , 2003, NIPS.

[20]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.