Locally Defined Principal Curves and Surfaces

Principal curves are defined as self-consistent smooth curves passing through the middle of the data, and they have been used in many applications of machine learning as a generalization, dimensionality reduction and a feature extraction tool. We redefine principal curves and surfaces in terms of the gradient and the Hessian of the probability density estimate. This provides a geometric understanding of the principal curves and surfaces, as well as a unifying view for clustering, principal curve fitting and manifold learning by regarding those as principal manifolds of different intrinsic dimensionalities. The theory does not impose any particular density estimation method can be used with any density estimator that gives continuous first and second derivatives. Therefore, we first present our principal curve/surface definition without assuming any particular density estimation method. Afterwards, we develop practical algorithms for the commonly used kernel density estimation (KDE) and Gaussian mixture models (GMM). Results of these algorithms are presented in notional data sets as well as real applications with comparisons to other approaches in the principal curve literature. All in all, we present a novel theoretical understanding of principal curves and surfaces, practical algorithms as general purpose machine learning tools, and applications of these algorithms to several practical problems.

[1]  Sanjeev R. Kulkarni,et al.  Principal curves with bounded turn , 2002, IEEE Trans. Inf. Theory.

[2]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[3]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[4]  Lucas C. Parra,et al.  Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps , 1996, Neural Computation.

[5]  Albert Cohen,et al.  Nonlinear Approximation of Random Functions , 1997, SIAM J. Appl. Math..

[6]  J. A. López del Val,et al.  Principal Components Analysis , 2018, Applied Univariate, Bivariate, and Multivariate Statistics Using Python.

[7]  R. Tibshirani,et al.  Adaptive Principal Surfaces , 1994 .

[8]  Deniz Erdogmus,et al.  Perturbation-Based Eigenvector Updates for On-Line Principal Components Analysis and Canonical Correlation Analysis , 2006, J. VLSI Signal Process..

[9]  Sheng Chen,et al.  A clustering technique for digital communications channel equalization using radial basis function networks , 1993, IEEE Trans. Neural Networks.

[10]  Joydeep Ghosh,et al.  A Unified Model for Probabilistic Principal Surfaces , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[12]  Sergios Theodoridis,et al.  Recognition of isolated musical patterns using Context Dependent Dynamic Time Warping , 2002, 2002 11th European Signal Processing Conference.

[13]  G. Dunteman Principal Components Analysis , 1989 .

[14]  Luís B. Almeida,et al.  MISEP -- Linear and Nonlinear ICA Based on Mutual Information , 2003, J. Mach. Learn. Res..

[15]  Jerry L. Prince,et al.  An active contour model for mapping the cortex , 1995, IEEE Trans. Medical Imaging.

[16]  Mohsen Shiva,et al.  Decision-Directed Recursive Least Squares MIMO Channels Tracking , 2006, EURASIP J. Wirel. Commun. Netw..

[17]  P. Delicado Principal curves and principal oriented points , 1998 .

[18]  M. Wand,et al.  ASYMPTOTICS FOR GENERAL MULTIVARIATE KERNEL DENSITY DERIVATIVE ESTIMATORS , 2011 .

[19]  Xie Yuan-dan,et al.  Survey on Image Segmentation , 2002 .

[20]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[21]  R. Tibshirani Principal curves revisited , 1992 .

[22]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[23]  B. Hansen UNIFORM CONVERGENCE RATES FOR KERNEL ESTIMATION WITH DEPENDENT DATA , 2008, Econometric Theory.

[24]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[25]  Eric R. Ziegel,et al.  Multivariate Statistical Modelling Based on Generalized Linear Models , 2002, Technometrics.

[26]  Hiroaki Sakoe,et al.  A Dynamic Programming Approach to Continuous Speech Recognition , 1971 .

[27]  Larry S. Davis,et al.  Mean-shift analysis using quasiNewton methods , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[28]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[29]  Ignacio Santamaría,et al.  A spectral clustering approach to underdetermined postnonlinear blind source separation of sparse sources , 2006, IEEE Transactions on Neural Networks.

[30]  Ben J. A. Kröse,et al.  A Soft k-Segments Algorithm for Principal Curves , 2001, ICANN.

[31]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[32]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[33]  Reinaldo A. Valenzuela,et al.  Simplified processing for high spectral efficiency wireless communication employing multi-element arrays , 1999, IEEE J. Sel. Areas Commun..

[34]  Adrian E. Raftery,et al.  Finding Curvilinear Features in Spatial Point Patterns: Principal Curve Clustering with Noise , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Jorge S. Marques,et al.  A class of constrained clustering algorithms for object boundary extraction , 1996, IEEE Trans. Image Process..

[36]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[37]  Michel Verleysen,et al.  Non-linear ICA by Using Isometric Dimensionality Reduction , 2004, ICA.

[38]  Eamonn J. Keogh,et al.  Everything you know about Dynamic Time Warping is Wrong , 2004 .

[39]  Xavier Bresson,et al.  Multiscale Active Contours , 2005, International Journal of Computer Vision.

[40]  Laurent D. Cohen,et al.  Finite-Element Methods for Active Contour Models and Balloons for 2-D and 3-D Images , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Laurent D. Cohen,et al.  On active contour models and balloons , 1991, CVGIP Image Underst..

[42]  A. Raftery,et al.  Ice Floe Identification in Satellite Images Using Mathematical Morphology and Clustering about Principal Curves , 1992 .

[43]  S.Y. Kung,et al.  Adaptive Principal component EXtraction (APEX) and applications , 1994, IEEE Trans. Signal Process..

[44]  Mahesan Niranjan,et al.  Parametric subspace modeling of speech transitions , 1999, Speech Commun..

[45]  Lippold Haken,et al.  Transient Preservation Under Transformation in an Additive Sound Model , 2000, ICMC.

[46]  Adam Krzyzak,et al.  Learning and Design of Principal Curves , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Aaron E. Rosenberg,et al.  Performance tradeoffs in dynamic time warping algorithms for isolated word recognition , 1980 .

[49]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.

[50]  Fadil Santosa,et al.  Recovery of Blocky Images from Noisy and Blurred Data , 1996, SIAM J. Appl. Math..

[51]  Kwok-Wo Wong,et al.  A Practical Sequential Method for Principal Component Analysis , 2004, Neural Processing Letters.

[52]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[53]  Deniz Erdogmus,et al.  Nonparametric Snakes , 2007, IEEE Transactions on Image Processing.

[54]  Miguel Á. Carreira-Perpiñán,et al.  Gaussian Mean-Shift Is an EM Algorithm , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Pascal Vincent,et al.  Non-Local Manifold Parzen Windows , 2005, NIPS.

[56]  O. Arikan,et al.  Piecewise smooth signal denoising via principal curve projections , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[57]  Kostas Berberidis,et al.  A square-root adaptive V-BLAST algorithm for fast time-varying MIMO channels , 2006, IEEE Signal Processing Letters.

[58]  Adam Krzyzak,et al.  Piecewise Linear Skeletonization Using Principal Curves , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Miguel Á. Carreira-Perpiñán,et al.  Fast nonparametric clustering with Gaussian blurring mean-shift , 2006, ICML.

[60]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Juha Karhunen,et al.  Advances in blind source separation (BSS) and independent component analysis (ICA) for nonlinear mixtures , 2004, Int. J. Neural Syst..

[62]  Jerry L. Prince,et al.  Snakes, shapes, and gradient vector flow , 1998, IEEE Trans. Image Process..

[63]  Deniz Erdogmus,et al.  Principal Curve Time Warping , 2009, IEEE Transactions on Signal Processing.

[64]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[65]  Sanjoy Dasgupta,et al.  Adaptive Control Processes , 2010, Encyclopedia of Machine Learning and Data Mining.

[66]  M. Niranjan,et al.  SUBSPACE MODELS FOR SPEECH TRANSITIONS USING PRINCIPAL , 1998 .

[67]  Aapo Hyvärinen,et al.  Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.

[68]  Frederic Fol Leymarie,et al.  Tracking Deformable Objects in the Plane Using an Active Contour Model , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  Rémi Ronfard,et al.  Region-based strategies for active contour models , 1994, International Journal of Computer Vision.

[70]  W. Stuetzle,et al.  Extremal properties of principal curves in the plane , 1996 .

[71]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[72]  Stephen M. Pizer,et al.  A Multiresolution Hierarchical Approach to Image Segmentation Based on Intensity Extrema , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  Jan de Leeuw,et al.  Nonlinear Principal Component Analysis , 1982 .

[74]  Ben J. A. Kröse,et al.  A k-segments algorithm for finding principal curves , 2002, Pattern Recognit. Lett..

[75]  Larry S. Davis,et al.  Improved fast gauss transform and efficient kernel density estimation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[76]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[77]  Nanda Kambhatla,et al.  Fast Non-Linear Dimension Reduction , 1993, NIPS.

[78]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[79]  Miguel Á. Carreira-Perpiñán,et al.  Density geodesics for similarity clustering , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[80]  H. Ritter,et al.  Local PCA learning with resolution-dependent mixtures of Gaussians , 1999 .

[81]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[82]  Patrick Flandrin,et al.  Improving the readability of time-frequency and time-scale representations by the reassignment method , 1995, IEEE Trans. Signal Process..

[83]  Vladimir Cherkassky,et al.  Self-Organization as an Iterative Kernel Smoothing Process , 1995, Neural Computation.

[84]  Jacques Froment,et al.  Reconstruction of Wavelet Coefficients Using Total Variation Minimization , 2002, SIAM J. Sci. Comput..

[85]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[86]  Dong Dong,et al.  Nonlinear principal component analysis-based on principal curves and neural networks , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[87]  David G. Stork,et al.  Pattern Classification , 1973 .

[88]  Orhan Arikan,et al.  A high resolution time frequency representation with significantly reduced cross-terms , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[89]  Keinosuke Fukunaga,et al.  An Algorithm for Finding Intrinsic Dimensionality of Data , 1971, IEEE Transactions on Computers.

[90]  Bernhard Schölkopf,et al.  Generalization bounds and learning rates for Regularized principal manifolds , 1998 .

[91]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[92]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[93]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[94]  J. Kalpathy-Cramer,et al.  Semi-supervised segmentation using non-parametric snakes for 3D-CT applications in Radiation Oncology , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[95]  T. Hastie,et al.  Principal Curves , 2007 .

[96]  Pascal Vincent,et al.  Manifold Parzen Windows , 2002, NIPS.

[97]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[98]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[99]  Dewang Chen,et al.  Freeway traffic stream modeling based on principal curves and its analysis , 2004, IEEE Transactions on Intelligent Transportation Systems.

[100]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Curtis R. Vogel,et al.  Iterative Methods for Total Variation Denoising , 1996, SIAM J. Sci. Comput..

[102]  KéglBalázs,et al.  Learning and Design of Principal Curves , 2000 .

[103]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[104]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[105]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[106]  Dorin Comaniciu,et al.  An Algorithm for Data-Driven Bandwidth Selection , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[107]  Orhan Arikan,et al.  High resolution time-frequency analysis by fractional domain warping , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[108]  Deniz Erdogmus,et al.  Nonlinear Coordinate Unfolding Via Principal Curve Projections with Application to Nonlinear BSS , 2007, ICONIP.

[109]  Michael Werman,et al.  The Bottleneck Geodesic: Computing Pixel Affinity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[110]  B. Kégl,et al.  Principal curves: learning, design, and applications , 2000 .

[111]  Kelly Fitz,et al.  Separation of components from impulses in reassigned spectrograms. , 2007, The Journal of the Acoustical Society of America.

[112]  G. De’ath PRINCIPAL CURVES: A NEW TECHNIQUE FOR INDIRECT AND DIRECT GRADIENT ANALYSIS , 1999 .

[113]  Duc Truong Pham,et al.  Control chart pattern recognition using a new type of self-organizing neural network , 1998 .

[114]  Ignacio Santamaría,et al.  A spectral clustering algorithm for decoding fast time-varying BPSK mimo channels , 2007, 2007 15th European Signal Processing Conference.

[115]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[116]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[117]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[118]  Alexander J. Smola,et al.  Quantization Functionals and Regularized Principal Manifolds , 1998 .

[119]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[120]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[121]  L. Rabiner,et al.  A statical decision approach to the recognition of connected digits , 1976 .

[122]  Hong Chang,et al.  Robust path-based spectral clustering with application to image segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[123]  IItevor Hattie Principal Curves and Surfaces , 1984 .

[124]  Ramani Duraiswami,et al.  Fast optimal bandwidth selection for kernel density estimation , 2006, SDM.

[125]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[126]  James L. Crowley,et al.  A Representation for Shape Based on Peaks and Ridges in the Difference of Low-Pass Transform , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[127]  Miguel Á. Carreira-Perpiñán,et al.  Proximity Graphs for Clustering and Manifold Learning , 2004, NIPS.

[128]  Werner Stuetzle,et al.  Geometric Properties of Principal Curves in the Plane , 1996 .

[129]  Joachim M. Buhmann,et al.  Path-Based Clustering for Grouping of Smooth Curves and Texture Segmentation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[130]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[131]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.