Centralized multi-scale singular value decomposition for feature construction in LIDAR image classification problems

Creation and selection of relevant features for machine learning applications (including image classification) is typically a process requiring significant involvement of domain knowledge. It is thus desirable to cover at least part of that process with semi-automated techniques capable of discovering and visualizing those geometric characteristics of images that are potentially relevant to the classification objective. In this work, we propose to utilize multi-scale singular value decomposition (MSVD) along with approximate nearest neighbors algorithm: both have been recently realized using the randomized approach, and can be efficiently run on large, high-dimensional datasets (sparse or dense). We apply this technique to create a multi-scale view of every point in a publicly available set of LIDAR data of riparian images, with classification objective being separating ground from vegetation. We perform “centralized MSVD” for every point and its neighborhood generated by an approximate nearest neighbor algorithm. After completion of this procedure, the original set of 3-dimensional data is augmented by 36 dimensions generated by MSVD (in three different scales), which is then processed using a novel discretization pre-processing method and the SVM classification algorithm with RBF kernel. The result is two times better that the one previously obtained (in terms of its classification error level). The generic nature of the MSVD mechanism and standard mechanisms used for classification (SVM) suggest a wider utility of the proposed approach for other problems as well.

[1]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[2]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[3]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[4]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[5]  V. Rokhlin,et al.  A fast randomized algorithm for overdetermined linear least-squares regression , 2008, Proceedings of the National Academy of Sciences.

[6]  G. Lerman Quantifying curvelike structures of measures by using L2 Jones quantities , 2003 .

[7]  Peter W. Jones Rectifiable sets and the Traveling Salesman Problem , 1990 .

[8]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[9]  M. Maggioni,et al.  Manifold parametrizations by eigenfunctions of the Laplacian and heat kernels , 2008, Proceedings of the National Academy of Sciences.

[10]  Peter W. Jones Square functions, Cauchy integrals, analytic capacity, and harmonic measure , 1989 .

[11]  Dimitri Lague,et al.  3D Terrestrial LiDAR data classification of complex natural scenes using a multi-scale dimensionality criterion: applications in geomorphology , 2011, ArXiv.

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  Sing-Tze Bow,et al.  Pattern recognition and image preprocessing , 1992 .

[14]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[15]  Stéphane Lafon,et al.  Diffusion maps , 2006 .