论文信息 - Condition Number Analysis of Kernel-based Density Ratio Estimation

Condition Number Analysis of Kernel-based Density Ratio Estimation

The ratio of two probability densities can be used for solving various machine learning tasks such as covariate shift adaptation (importance sampling), outlier detection (likelihood-ratio test), and feature selection (mutual information). Recently, several methods of directly estimating the density ratio have been developed, e.g., kernel mean matching, maximum likelihood density ratio estimation, and least-squares density ratio fitting. In this paper, we consider a kernelized variant of the least-squares method and investigate its theoretical properties from the viewpoint of the condition number using smoothed analysis techniques--the condition number of the Hessian matrix determines the convergence rate of optimization and the numerical stability. We show that the kernel least-squares method has a smaller condition number than a version of kernel mean matching and other M-estimators, implying that the kernel least-squares method has preferable numerical properties. We further give an alternative formulation of the kernel least-squares estimator which is shown to possess an even smaller condition number. We show that numerical studies meet our theoretical analysis.

Masashi Sugiyama | Taiji Suzuki | T. Kanamori

[1] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[2] G. Wahba,et al. Some results on Tchebycheffian spline functions , 1971 .

[3] J. J. Moré,et al. Newton's Method , 1982 .

[4] Gene H. Golub,et al. Matrix computations , 1983 .

[5] David G. Luenberger,et al. Linear and nonlinear programming , 1984 .

[6] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[7] A. Edelman. Eigenvalues and condition numbers of random matrices , 1988 .

[8] E. Zeidler. Nonlinear functional analysis and its applications , 1988 .

[9] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[10] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .

[11] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[12] Alexander J. Smola,et al. Learning with kernels , 1998 .

[13] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[14] S. Geer. Empirical Processes in M-Estimation , 2000 .

[15] Shang-Hua Teng,et al. Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[16] Bernhard Schölkopf,et al. Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[17] Ingo Steinwart,et al. On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[18] Ding-Xuan Zhou,et al. The covering number in learning theory , 2002, J. Complex..

[19] Mikio Nakahara,et al. Geometry, Topology and Physics, Second Edition , 2003 .

[20] Bianca Zadrozny,et al. Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[21] Robert P. W. Duin,et al. Support Vector Data Description , 2004, Machine Learning.

[22] Victoria J. Hodge,et al. A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[23] Masashi Sugiyama,et al. Input-dependent estimation of generalization error under covariate shift , 2005 .

[24] Alan Edelman,et al. Tails of Condition Number Distributions , 2005, SIAM J. Matrix Anal. Appl..

[25] Robert A. Lordo,et al. Nonparametric and Semiparametric Models , 2005, Technometrics.

[26] D. Spielman,et al. Smoothed Analysis of the Condition Numbers and Growth Factors of Matrices , 2003, SIAM Journal on Matrix Analysis and Applications.

[27] Bernhard Schölkopf,et al. A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[28] Bernhard Schölkopf,et al. Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[29] Motoaki Kawanabe,et al. Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[30] Klaus-Robert Müller,et al. Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[31] Terence Tao,et al. The condition number of a randomly perturbed matrix , 2007, STOC '07.

[32] Martin J. Wainwright,et al. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.

[33] Takafumi Kanamori,et al. Inlier-Based Outlier Detection via Direct Density Ratio Estimation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[34] Takafumi Kanamori,et al. Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation , 2008, FSDM.

[35] Masashi Sugiyama,et al. Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation , 2008, SDM.

[36] M. Kawanabe,et al. Direct importance estimation for covariate shift adaptation , 2008 .

[37] Steffen Bickel,et al. Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[38] Takafumi Kanamori,et al. A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..