Another Look at Principal Curves and Surfaces

Principal curves have been defined as smooth curves passing through the “middle” of a multidimensional data set. They are nonlinear generalizations of the first principal component, a characterization of which is the basis of the definition of principal curves. We establish a new characterization of the first principal component and base our new definition of a principal curve on this property. We introduce the notion of principal oriented points and we prove the existence of principal curves passing through these points. We extend the definition of principal curves to multivariate data sets and propose an algorithm to find them. The new notions lead us to generalize the definition of total variance. Successive principal curves are recursively defined from this generalization. The new methods are illustrated on simulated and real data sets.

[1]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[2]  R. P. McDonald,et al.  A second generation nonlinear factor analysis , 1983 .

[3]  T. Hastie Principal Curves and Surfaces , 1984 .

[4]  V. Yohai,et al.  Nonlinear principal components , 1985 .

[5]  Robert A. Koyak,et al.  On Measuring Internal Dependence in a Set of Random Variables , 1987 .

[6]  M. Hill,et al.  Nonlinear Multivariate Analysis. , 1990 .

[7]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[8]  R. Tibshirani Principal curves revisited , 1992 .

[9]  A. Gifi,et al.  NONLINEAR MULTIVARIATE ANALYSIS , 1990 .

[10]  Dong Dong,et al.  Nonlinear principal component analysis-based on principal curves and neural networks , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[11]  R. Tibshirani,et al.  Adaptive Principal Surfaces , 1994 .

[12]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[13]  Vladimir Cherkassky,et al.  Self-Organization as an Iterative Kernel Smoothing Process , 1995, Neural Computation.

[14]  Shufeng Tan,et al.  Reducing data dimensionality through optimizing neural network inputs , 1995 .

[15]  P. J. Huber,et al.  Robust statistics, data analysis, and computer intensive methods : in honor of Peter Huber's 60th birthday , 1996 .

[16]  Bernard D. Flury,et al.  Self-Consistency: A Fundamental Concept in Statistics , 1996 .

[17]  W. Stuetzle,et al.  Extremal properties of principal curves in the plane , 1996 .

[18]  Werner Stuetzle,et al.  Geometric Properties of Principal Curves in the Plane , 1996 .

[19]  T. McAvoy,et al.  Nonlinear principal component analysis—Based on principal curves and neural networks , 1996 .

[20]  Adrian E. Raftery,et al.  Principal Curve Clustering With Noise , 1997 .

[21]  L. Corwin,et al.  Calculus in vector spaces , 1997 .

[22]  B. Presnell,et al.  Expect the unexpected from conditional expectation , 1998 .

[23]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[24]  P. Delicado Principal curves and principal oriented points , 1998 .

[25]  Alexander J. Smola,et al.  Quantization Functionals and Regularized Principal Manifolds , 1998 .

[26]  Bernhard Schölkopf,et al.  Generalization bounds and learning rates for Regularized principal manifolds , 1998 .

[27]  Ernesto Salinelli Nonlinear principal components I. Absolutely continuous random variables with positive bounded densities , 1998 .

[28]  Paulo J. G. Lisboa,et al.  The generative topographic mapping as a principal model for data visualization and market segmentation: an electronic commerce case , 2000, Int. J. Comput. Syst. Signals.

[29]  Adam Krzyzak,et al.  Learning and Design of Principal Curves , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[31]  D.,et al.  ICE FLOE IDENTIFICATION IN SATELLITE IMAGES USING MATHEMATICAL MORPHOLOGY AND CLUSTERING ABOUT PRINCIPAL CURVES , .