Nonlinear partial least squares

Nonlinear Partial Least Squares Edward Carl Malthouse We propose a new nonparametric regression method for high-dimensional data, nonlinear partial least squares (NLPLS). NLPLS is motivated by projection-based regression methods, e.g., partial least squares (PLS), projection pursuit (PPR), and feedforward neural networks. The model takes the form of a composition of two functions. The rst function in the composition projects the predictor variables onto a lower-dimensional curve or surface yielding scores, and the second predicts the response variable from the scores. We implement NLPLS with feedforward neural networks. NLPLS will often produce a more parsimonious model (fewer score vectors) than projection-based methods, and the model is well suited for detecting outliers and future covariates requiring extrapolation. The scores are also shown to have useful interpretations. We also extend the model for multiple response variables and discuss situations when multiple response variables should be modeled simultaneously and when they should be modeled with separate regressions. We provide empirical results from mathematical and chemical engineering examples which evaluate the performances of PLS, NLPLS, PPR, and three-layer neural networks on (1) response variable predictions, (2) model parsimony, (3) computational requirements, and (4) robustness to starting values. The curves and surfaces used by NLPLS are motivated by the nonlinear principal components analysis (NLPCA) method of doing nonlinear feature extraction. We develop certain properties of NLPCA and discuss its relation to the principal curve method. Both methods attempt to reduce the dimension of a set of multivariate observations by tting a curve through the middle of the observations and projecting the observations onto this curve. The two methods t their models under a similar objective function, with one important di erence: NLPCA de nes the function which maps observed variables to scores (projection index) to be continuous. We show that the e ects of this constraint are (1) NLPCA is unable to model curves and surfaces which intersect themselves and (2) the NLPCA \projections" are suboptimal producing larger approximation error. We show how NLPCA score values can be interpreted and give the results of a small simulation study comparing the two methods. iii ACKNOWLEDGMENTS I thank my advisors, Ajit Tamhane and Richard Mah, for suggesting this topic and for their guidance throughout this project. They have challenged me with many excellent questions and I have bene ted greatly from their supervision. I also thank my other committee members, Thomas Severini and Lina Massone, for many helpful discussions and for suggesting some important references. I am indebted to Northwestern University's department of statistics for supporting my studies and providing me with ne computational facilities. I thank Trevor Hastie for sending me copies of technical reports and for helpful correspondences. I thank John Fildes and Scott Milkovich for providing me with data and Rick Briesch for many helpful discussions, I thank James and Susan Crawford for their support and encouragement. I am grateful to all my teachers, particularly Donald McLaughlin, Douglas Nelson, Robert Johnson, Victor Baston, and Chris Potts for their outstanding teaching and encouragement. I am also grateful to my rst and best teachers, my parents and family. Finally, my wife Elisabeth has enriched my life greatly and has provided an important balance in my life during my years in graduate school. iv

