Fast Multi-Class Probabilistic Classifier by Sparse Non-parametric Density Estimation

The model interpretation is essential in many application scenarios and to build a classification model with a ease of model interpretation may provide useful information for further studies and improvement. It is common to encounter with a lengthy set of variables in modern data analysis, especially when data are collected in some automatic ways. This kinds of datasets may not collected with a specific analysis target and usually contains redundant features, which have no contribution to a the current analysis task of interest. Variable selection is a common way to increase the ability of model interpretation and is popularly used with some parametric classification models. There is a lack of studies about variable selection in nonparametric classification models such as the density estimation-based methods and this is especially the case for multiple-class classification situations. In this study we study multiple-class classification problems using the thought of sparse non-parametric density estimation and propose a method for identifying high impacts variables for each class. We present the asymptotic properties and the computation procedure for the proposed method together with some suggested sample size. We also repost the numerical results using both synthesized and some real data sets.

[1]  Yuhong Yang,et al.  Minimax Nonparametric Classification—Part I: Rates of Convergence , 1998 .

[2]  J. V. Ryzin,et al.  The Compound Decision Problem with $m \times n$ Finite Loss Matrix , 1966 .

[3]  Shean-Tsong Chiu,et al.  Bandwidth selection for kernel density estimation , 1991 .

[4]  Matt P. Wand,et al.  Minimizing L 1 distance in nonparametric density estimation , 1988 .

[5]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[6]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[7]  Masashi Sugiyama,et al.  Superfast-Trainable Multi-Class Probabilistic Classifier by Least-Squares Posterior Fitting , 2010, IEICE Trans. Inf. Syst..

[8]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[9]  Jacek Mandziuk,et al.  Multiple-resolution classification with combination of density estimators , 2011, Connect. Sci..

[10]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[11]  Dan Roth,et al.  Constraint Classification for Multiclass Classification and Ranking , 2002, NIPS.

[12]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[13]  G. Lugosi,et al.  Consistency of Data-driven Histogram Methods for Density Estimation and Classification , 1996 .

[14]  Luc Devroye,et al.  Nonparametric Density Estimation , 1985 .

[15]  Yuhong Yang,et al.  Minimax Nonparametric Classification — Part II : Model Selection for Adaptation , 1998 .

[16]  Shean-Tsong Chiu,et al.  Some stabilized bandwidth selectors for nonparametric regression , 1991 .

[17]  D. W. Scott,et al.  Multidimensional Density Estimation , 2005 .

[18]  Pao-Ta Yu,et al.  A Nonparametric Feature Extraction and Its Application to Nearest Neighbor Classification for Hyperspectral Image Data , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[19]  Miroslaw Pawlak,et al.  Almost sure convergence of classification procedures using Hermite series density estimates , 1983, Pattern Recognit. Lett..

[20]  J. Marron,et al.  Smoothed cross-validation , 1992 .

[21]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[22]  A. Martinsek,et al.  Bounding the L1 Distance in Nonparametric Density Estimation , 1997 .

[23]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .

[24]  Luc Devroye,et al.  A distribution-free performance bound in error estimation (Corresp.) , 1976, IEEE Trans. Inf. Theory.

[25]  Kee-Hoon Kang,et al.  Bandwidth choice for nonparametric classification , 2005 .

[26]  Debasis Sengupta,et al.  Classification Using Kernel Density Estimates , 2006, Technometrics.

[27]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[28]  Yi Lin,et al.  NONPARAMETRIC DENSITY ESTIMATION IN HIGH-DIMENSIONS , 2013 .

[29]  Shean-Tsong Chiu An automatic bandwidth selector for kernel density estimation , 1992 .

[30]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[31]  Larry A. Wasserman,et al.  Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo , 2007, AISTATS.

[32]  L. Devroye,et al.  Nonparametric Density Estimation: The L 1 View. , 1985 .

[33]  Dirk P. Kroese,et al.  Kernel density estimation via diffusion , 2010, 1011.2602.

[34]  Antonio Artés-Rodríguez,et al.  Algorithms for maximum-likelihood bandwidth selection in kernel density estimators , 2012, Pattern Recognit. Lett..