Learning in Parametric Modeling: Basic Concepts and Directions

The chapter presents an overview of basic directions in machine learning and the basic notions related to parametric modeling are introduced. The tasks of regression and classification are defined and basic concepts related to parameter estimation are outlined such as estimator efficiency, Cramer-Rao bound, sufficient statistic. The least-squares estimator and some of its properties are discussed. The notions of inverse problems, overfitting, bias-variance dilemma and regularization are presented. The methods of maximum likelihood, maximum a posteriori and Bayesian inference are introduced. Finally, the curse of dimensionality and the cross-validation technique are provided. The chapter closes with a discussion on nonparametric models with an emphasis on Parzen windows and the k -nearest neighbor density estimation approach.

[1]  David L. Phillips,et al.  A Technique for the Numerical Solution of Certain Integral Equations of the First Kind , 1962, JACM.

[2]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[4]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[5]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[6]  H. Akaike A new look at the statistical model identification , 1974 .

[7]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[8]  Edward J. Wegman,et al.  Statistical Signal Processing , 1985 .

[9]  J.C. Principe,et al.  From linear adaptive filtering to nonlinear information processing - The design and analysis of information processing systems , 2006, IEEE Signal Processing Magazine.

[10]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[11]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[12]  Radford M. Neal Assessing Relevance determination methods using DELVE , 1998 .

[13]  Yonina C. Eldar,et al.  Rethinking Biased Estimation , 2008 .

[14]  David G. Stork,et al.  Pattern Classification , 1973 .

[15]  J. Friedman Regularized Discriminant Analysis , 1989 .

[16]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[17]  Rabab K. Ward,et al.  14 FROM LINEAR ADAPTIVE FILTERING TO NONLINEAR INFORMATION PROCESSING , 2006 .

[18]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[19]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Adi Ben-Israel,et al.  Generalized inverses: theory and applications , 1974 .

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[24]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[25]  R. Bellman Dynamic programming. , 1957, Science.

[26]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[27]  S. Larson The shrinkage of the coefficient of multiple correlation. , 1931 .

[28]  S. Fiske,et al.  The Handbook of Social Psychology , 1935 .

[29]  Sergios Theodoridis,et al.  Introduction to Pattern Recognition: A Matlab Approach , 2010 .

[30]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[31]  H. Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[32]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[33]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[34]  Yonina C. Eldar,et al.  Rethinking biased estimation [Lecture Notes] , 2008, IEEE Signal Processing Magazine.

[35]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.