Nonlinear partial least squares

Nonlinear Partial Least Squares Edward Carl Malthouse We propose a new nonparametric regression method for high-dimensional data, nonlinear partial least squares (NLPLS). NLPLS is motivated by projection-based regression methods, e.g., partial least squares (PLS), projection pursuit (PPR), and feedforward neural networks. The model takes the form of a composition of two functions. The rst function in the composition projects the predictor variables onto a lower-dimensional curve or surface yielding scores, and the second predicts the response variable from the scores. We implement NLPLS with feedforward neural networks. NLPLS will often produce a more parsimonious model (fewer score vectors) than projection-based methods, and the model is well suited for detecting outliers and future covariates requiring extrapolation. The scores are also shown to have useful interpretations. We also extend the model for multiple response variables and discuss situations when multiple response variables should be modeled simultaneously and when they should be modeled with separate regressions. We provide empirical results from mathematical and chemical engineering examples which evaluate the performances of PLS, NLPLS, PPR, and three-layer neural networks on (1) response variable predictions, (2) model parsimony, (3) computational requirements, and (4) robustness to starting values. The curves and surfaces used by NLPLS are motivated by the nonlinear principal components analysis (NLPCA) method of doing nonlinear feature extraction. We develop certain properties of NLPCA and discuss its relation to the principal curve method. Both methods attempt to reduce the dimension of a set of multivariate observations by tting a curve through the middle of the observations and projecting the observations onto this curve. The two methods t their models under a similar objective function, with one important di erence: NLPCA de nes the function which maps observed variables to scores (projection index) to be continuous. We show that the e ects of this constraint are (1) NLPCA is unable to model curves and surfaces which intersect themselves and (2) the NLPCA \projections" are suboptimal producing larger approximation error. We show how NLPCA score values can be interpreted and give the results of a small simulation study comparing the two methods. iii ACKNOWLEDGMENTS I thank my advisors, Ajit Tamhane and Richard Mah, for suggesting this topic and for their guidance throughout this project. They have challenged me with many excellent questions and I have bene ted greatly from their supervision. I also thank my other committee members, Thomas Severini and Lina Massone, for many helpful discussions and for suggesting some important references. I am indebted to Northwestern University's department of statistics for supporting my studies and providing me with ne computational facilities. I thank Trevor Hastie for sending me copies of technical reports and for helpful correspondences. I thank John Fildes and Scott Milkovich for providing me with data and Rick Briesch for many helpful discussions, I thank James and Susan Crawford for their support and encouragement. I am grateful to all my teachers, particularly Donald McLaughlin, Douglas Nelson, Robert Johnson, Victor Baston, and Chris Potts for their outstanding teaching and encouragement. I am also grateful to my rst and best teachers, my parents and family. Finally, my wife Elisabeth has enriched my life greatly and has provided an important balance in my life during my years in graduate school. iv

[1]  Jenq-Neng Hwang,et al.  Regression modeling in back-propagation and projection pursuit learning , 1994, IEEE Trans. Neural Networks.

[2]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[3]  Robert Tibshirani,et al.  A Comparison of Some Error Estimates for Neural Network Models , 1996, Neural Computation.

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[6]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[7]  Trevor Hastie,et al.  Automatic Smoothing Spline Projection Pursuit , 1994 .

[8]  T. Fearn A Misuse of Ridge Regression in the Calibration of a Near Infrared Reflectance Instrument , 1983 .

[9]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[10]  I. Johnstone,et al.  Projection-Based Approximation and a Duality with Kernel Methods , 1989 .

[11]  Jerome H. Friedman [The ∏ Method for Estimating Multivariate Functions from Noisy Data]: Discussion , 1991 .

[12]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[13]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[14]  Thomas B. Fomby,et al.  Component selection norms for principal components regression , 1977 .

[15]  J. Tyrrell,et al.  Analytic Geometry , 1965, Nature.

[16]  Tormod Næs,et al.  Comparison of prediction methods for multicollinear data , 1985 .

[17]  P. Garthwaite An Interpretation of Partial Least Squares , 1994 .

[18]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[19]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[20]  R. Tibshirani,et al.  Adaptive Principal Surfaces , 1994 .

[21]  J. Friedman,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Response , 1993 .

[22]  R. Tibshirani,et al.  Linear Smoothers and Additive Models , 1989 .

[23]  R. Brooks,et al.  Joint Continuum Regression for Multiple Predictands , 1994 .

[24]  M. Stone Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least s , 1990 .

[25]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[26]  Thomas J. McAvoy,et al.  Nonlinear PLS Modeling Using Neural Networks , 1992 .

[27]  I. Helland ON THE STRUCTURE OF PARTIAL LEAST SQUARES REGRESSION , 1988 .

[28]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[29]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[30]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[31]  A. Raftery,et al.  Ice Floe Identification in Satellite Images Using Mathematical Morphology and Clustering about Principal Curves , 1992 .

[32]  Daryl Pregibon,et al.  Tree-based models , 1992 .

[33]  大江 修造,et al.  Vapor-liquid equilibrium data , 1989 .

[34]  Ronald A. Thisted,et al.  Elements of statistical computing , 1986 .

[35]  Edward C. Malthouse,et al.  Nonlinear partial least squares , 1997 .

[36]  S. Wold Nonlinear partial least squares modelling II. Spline inner relation , 1992 .

[37]  I. Jolliffe A Note on the Use of Principal Components in Regression , 1982 .

[38]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[39]  P. Diaconis,et al.  On Nonlinear Functions of Linear Combinations , 1984 .

[40]  IItevor Hattie Principal Curves and Surfaces , 1984 .

[41]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[42]  [The ∏ Method for Estimating Multivariate Functions from Noisy Data]: Discussion , 1991 .

[43]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .