A New Approach to Fitting Linear Models in High Dimensional Spaces

This thesis presents a new approach to fitting linear models, called “pace regression”, which also overcomes the dimensionality determination problem. Its optimality in minimizing the expected prediction loss is theoretically established, when the number of free parameters is infinitely large. In this sense, pace regression outperforms existing procedures for fitting linear models. Dimensionality determination, a special case of fitting linear models, turns out to be a natural by-product. A range of simulation studies are conducted; the results support the theoretical analysis. Through the thesis, a deeper understanding is gained of the problem of fitting linear models. Many key issues are discussed. Existing procedures, namely OLS, AIC, BIC, RIC, CIC, CV(d), BS(m), RIDGE, NN-GAROTTE and LASSO, are reviewed and compared, both theoretically and empirically, with the new methods. Estimating a mixing distribution is an indispensable part of pace regression. A measure-based minimum distance approach, including probability measures and nonnegative measures, is proposed, and strongly consistent estimators are produced. Of all minimum distance methods for estimating a mixing distribution, only the nonnegative-measure-based one solves the minority cluster problem, what is vital for pace regression. Pace regression has striking advantages over existing techniques for fitting linear models. It also has more general implications for empirical modeling, which are discussed in the thesis. iii

[1]  G. Pólya,et al.  Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung und das Momentenproblem , 1920 .

[2]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[3]  Rory A. Fisher,et al.  Theory of Statistical Estimation , 1925, Mathematical Proceedings of the Cambridge Philosophical Society.

[4]  A. Wald Contributions to the Theory of Statistical Estimation and Testing Hypotheses , 1939 .

[5]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[6]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[7]  Herbert Robbins,et al.  Mixture of Distributions , 1948 .

[8]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[9]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[10]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[11]  J. Kiefer,et al.  CONSISTENCY OF THE MAXIMUM LIKELIHOOD ESTIMATOR IN THE PRESENCE OF INFINITELY MANY INCIDENTAL PARAMETERS , 1956 .

[12]  C. Stein Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution , 1956 .

[13]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[14]  H. Teicher Identifiability of Mixtures , 1961 .

[15]  S. Ikeda On characterization of the kullback-leibler mean information for continuous probability distributions , 1962 .

[16]  Leonard J. Savage,et al.  The foundations of statistical inference : a discussion , 1962 .

[17]  Jerzy Neyman,et al.  Two Breakthroughs in the Theory of Statistical Decision Making , 1962 .

[18]  H. Robbins The Empirical Bayes Approach to Statistical Decision Problems , 1964 .

[19]  Stanley L. Sclove,et al.  Improved Estimators for Coefficients in Linear Regression , 1968 .

[20]  J. Deely,et al.  Construction of Sequences Estimating the Mixing Distribution , 1968 .

[21]  Keewhan Choi,et al.  An Estimation Procedure for Mixtures of Distributions , 1968 .

[22]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[23]  J. B. Copas,et al.  Compound Decisions and Empirical Bayes , 1969 .

[24]  H. Akaike Statistical predictor identification , 1970 .

[25]  L. S. Feldt,et al.  THE SELECTION OF VARIABLES IN MULTIPLE REGRESSION ANALYSIS , 1970 .

[26]  P. D. M. Macdonald Comments and Queries Comment on “An Estimation Procedure for Mixtures of Distributions” by Choi and Bulgren , 1971 .

[27]  B. Efron,et al.  Limiting the Risk of Bayes and Empirical Bayes Estimators—Part I: The Bayes Case , 1971 .

[28]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[29]  B. Efron,et al.  Limiting the Risk of Bayes and Empirical Bayes Estimators—Part II: The Empirical Bayes Case , 1972 .

[30]  B. Efron,et al.  Empirical Bayes on vector observations: An extension of Stein's method , 1972 .

[31]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[32]  B. Efron,et al.  Stein's Estimation Rule and Its Competitors- An Empirical Bayes Approach , 1973 .

[33]  B. Efron,et al.  Data Analysis Using Stein's Estimator and its Generalizations , 1975 .

[34]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[35]  R. Shibata Selection of the order of an autoregressive model by Akaike's information criterion , 1976 .

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  B. Efron,et al.  Stein's Paradox in Statistics , 1977 .

[38]  V. Susarla,et al.  Estimation of a mixing distribution function , 1977 .

[39]  Changbao Wu,et al.  Some iterative procedures for generating nonsingular optimal designs , 1978 .

[40]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[41]  Changbao Wu,et al.  Some Algorithmic Aspects of the Theory of Optimal Designs , 1978 .

[42]  N. Laird Nonparametric Maximum Likelihood Estimation of a Mixing Distribution , 1978 .

[43]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[44]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[45]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[46]  R. Shibata An optimal selection of regression variables , 1981 .

[47]  Carl Friedrich Gauss Theoria motus corporum coelestium , 1981 .

[48]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[49]  Dankmar Bohning Convergence of Simar's Algorithm for Finding the Maximum Likelihood Estimate of a Compound Poisson Process , 1982 .

[50]  H. Robbins Some Thoughts on Empirical Bayes Estimation , 1983 .

[51]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[52]  B. Lindsay The Geometry of Mixture Likelihoods: A General Theory , 1983 .

[53]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[54]  Anne Lohrli Chapman and Hall , 1985 .

[55]  Z. Bai,et al.  On detection of the number of signals in presence of white noise , 1985 .

[56]  G. Alistair Watson,et al.  A projected lagrangian algorithm for semi-infinite programming , 1985, Math. Program..

[57]  H. Robbins Asymptotically Subminimax Solutions of Compound Statistical Decision Problems , 1985 .

[58]  Dankmar Böhning,et al.  Numerical estimation of a probability measure , 1985 .

[59]  Ritei Shibata,et al.  Consistency of model selection and parameter estimation , 1986, Journal of Applied Probability.

[60]  R. Dersimonian Maximum Likelihood Estimation of a Mixing Distribution , 1986 .

[61]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[62]  Arthur E. Hoerl,et al.  A simulation of biased estimation and subset selection regression techniques , 1986 .

[63]  Dankmar Böhning,et al.  A vertex-exchange-method in D-optimal design theory , 1986 .

[64]  Jack Dongarra,et al.  LINPACK Users' Guide , 1987 .

[65]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[66]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[67]  Janos Galambos,et al.  Advanced probability theory , 1988 .

[68]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[69]  Calyampudi R. Rao,et al.  A strongly consistent procedure for model selection in a regression problem , 1989 .

[70]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[71]  R. DerSimonian Correction to Algorithm as 221: Maximum Likelihood Estimation of a Mixing Distribution , 1990 .

[72]  J. Maritz,et al.  Empirical Bayes Methods, 2nd Edition. , 1990 .

[73]  L. Brown An Ancillarity Paradox Which Appears in Multiple Linear Regression , 1990 .

[74]  Ellen B. Roecker,et al.  Prediction error and its estimation for subset-selected models , 1991 .

[75]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[76]  N. L. Johnson,et al.  Breakthroughs in Statistics , 1992 .

[77]  J. Kalbfleisch,et al.  An Algorithm for Computing the Nonparametric MLE of a Mixing Distribution , 1992 .

[78]  Alan J. Miller Subset Selection in Regression , 1992 .

[79]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[80]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[81]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[82]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[83]  J. S. Maritz,et al.  Empirical Bayes Methods , 1974 .

[84]  L. Breiman The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error , 1992 .

[85]  Ping Zhang Model Selection Via Multifold Cross Validation , 1993 .

[86]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[87]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[88]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[89]  D. L. Donoho,et al.  Ideal spacial adaptation via wavelet shrinkage , 1994 .

[90]  T. Rao,et al.  Identifiability in stochastic models : characterization of probability distributions , 1994 .

[91]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[92]  Bruce G. Lindsay,et al.  A review of semiparametric mixture models , 1995 .

[93]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[94]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[95]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[96]  C. Mallows More comments on C p , 1995 .

[97]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[98]  J. Shao,et al.  The jackknife and bootstrap , 1996 .

[99]  J. Shao Bootstrap Model Selection , 1996 .

[100]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[101]  Bradley Efron,et al.  R.A. Fisher In The 21St Century , 1997 .

[102]  D. Kilpatrick,et al.  Numeric Prediction Using Instance-Based Learning with Encoding Length Selection , 1997, ICONIP.

[103]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[104]  Bradley P. Carlin,et al.  BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[105]  P. Barbe Statistical Analysis of Mixtures and the Empirical Probability Measure , 1998 .

[106]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[107]  D. Böhning,et al.  Some News about C.A.MAN Computer Assisted Analysis of Mixtures , 1998 .

[108]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[109]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[110]  C. H. Oh,et al.  Some comments on , 1998 .

[111]  H. Müller,et al.  Local Polynomial Modeling and Its Applications , 1998 .

[112]  David R. Anderson,et al.  Model Selection and Inference: A Practical Information-Theoretic Approach , 2001 .

[113]  Jack Dongarra,et al.  LAPACK Users' Guide, 3rd ed. , 1999 .

[114]  T. Louis,et al.  Empirical Bayes: Past, Present and Future , 2000 .

[115]  Guohua Pan,et al.  Local Regression and Likelihood , 1999, Technometrics.

[116]  M. Wand Local Regression and Likelihood , 2001 .

[117]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.