A New Geometric Approach to the Complexity of Model Selection

Model selection is one of the central problems of machine learning. The goal of model selection is to select from a set of competing explanations the best one that capture the underlying regularities of given observations. The criterion of a good model is generalizability. We must make balance between the goodness of fit and the complexity of the model to obtain good generalization. Most of present methods are consistent in goodness of fit and differ in complexity. But they only focus on the free parameters of the model; hence they cannot describe the intrinsic complexity of the model and they are not invariant under re-parameterization of the model. This paper uses a new geometrical method to study the complexity of the model selection problem. We propose that the integral of the Gauss-Kronecker curvature of the statistical manifold is the natural measurement of the non-linearity of the manifold of the model. This approach provides a clear intuitive understanding of the intrinsic complexity of the model We use an experiment to verify the criterion based on this method

[1]  B. Efron Defining the Curvature of a Statistical Problem (with Applications to Second Order Efficiency) , 1975 .

[2]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[3]  L. Santaló Integral geometry and geometric probability , 1976 .

[4]  Masashi Sugiyama,et al.  Subspace Information Criterion for Model Selection , 2001, Neural Computation.

[5]  Huaiyu Zhu,et al.  Information geometric measurements of generalisation , 1995 .

[6]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  M. Spivak A comprehensive introduction to differential geometry , 1979 .

[9]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[10]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[11]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[12]  I. J. Myung,et al.  Counting probability distributions: Differential geometry and model selection , 2000, Proc. Natl. Acad. Sci. USA.

[13]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[14]  Anuj Srivastava,et al.  Analysis of planar shapes using geodesic paths on shape spaces , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Xiuwen Liu,et al.  A Computational Approach to Fisher Information Geometry with Applications to Image Analysis , 2005, EMMCVPR.

[16]  Vijay Balasubramanian,et al.  Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions , 1996, Neural Computation.

[17]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[18]  H. Piaggio Differential Geometry of Curves and Surfaces , 1952, Nature.

[19]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[20]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[21]  D. G. Watts,et al.  Relative Curvature Measures of Nonlinearity , 1980 .

[22]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[23]  Jay I. Myung,et al.  Model Comparison Methods , 2004, Numerical Computer Methods, Part D.

[24]  G. Vazquez Elisa,et al.  Model selection methods in multilayer perceptrons , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[25]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[26]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.