Model selection in high dimensions: a quadratic‐risk‐based approach

We propose a general class of risk measures which can be used for data-based evaluation of parametric models. The loss function is defined as the generalized quadratic distance between the true density and the model proposed. These distances are characterized by a simple quadratic form structure that is adaptable through the choice of a non-negative definite kernel and a bandwidth parameter. Using asymptotic results for the quadratic distances we build a quick-to-compute approximation for the risk function. Its derivation is analogous to the Akaike information criterion but, unlike the Akaike information criterion, the quadratic risk is a global comparison tool. The method does not require resampling, which is a great advantage when point estimators are expensive to compute. The method is illustrated by using the problem of selecting the number of components in a mixture model, where it is shown that, by using an appropriate kernel, the method is computationally straightforward in arbitrarily high data dimensions. In this same context it is shown that the method has some clear advantages over the Akaike information criterion and Bayesian information criterion. Copyright 2008 Royal Statistical Society.

[1]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[2]  L. Wasserman,et al.  Asymptotic inference for mixture models by using data‐dependent priors , 2000 .

[3]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[4]  Guninar Blom Some properties of incomplete U-statistics , 1976 .

[5]  M. Aitkin Likelihood and Bayesian analysis of mixtures , 2001 .

[6]  David R. Anderson,et al.  Understanding AIC and BIC in Model Selection , 2004 .

[7]  Yuhong Yang Can the Strengths of AIC and BIC Be Shared , 2005 .

[8]  Walter R. Gilks,et al.  Bayesian model comparison via jump diffusions , 1995 .

[9]  Adrian E. Raftery,et al.  Hypothesis testing and model selection , 1996 .

[10]  Surajit Ray DISTANCE-BASED MODEL-SELECTION WITH APPLICATION TO THE ANALYSIS OF GENE EXPRESSION DATA , 2003 .

[11]  Susan R. Wilson Sound and Exploratory Data Analysis , 1982 .

[12]  S. Keleş,et al.  Statistical Applications in Genetics and Molecular Biology Asymptotic Optimality of Likelihood-Based Cross-Validation , 2011 .

[13]  H. Akaike Autoregressive model fitting for control , 1971 .

[14]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[15]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[16]  M. Stephens Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods , 2000 .

[17]  P. Deb Finite Mixture Models , 2008 .

[18]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[19]  Y. Shao,et al.  Asymptotics for likelihood ratio tests under loss of identifiability , 2003 .

[20]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[21]  Agostino Nobile,et al.  On the posterior distribution of the number of components in a finite mixture , 2004, math/0503673.

[22]  B. Pugh,et al.  Interplay of TBP inhibitors in global transcriptional control. , 2002, Molecular cell.

[23]  Marianthi Markatou,et al.  Quadratic distances on probabilities: A unified foundation , 2008, 0804.0991.

[24]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[25]  Murray Aitkin,et al.  A new test for the presence of a normal mixture distribution based on the posterior Bayes factor , 1996, Stat. Comput..

[26]  Surajit Ray,et al.  The topography of multivariate normal mixtures , 2005 .

[27]  A. Bowman,et al.  Adaptive Smoothing and Density-Based Tests of Multivariate Normality , 1993 .

[28]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[29]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .