Multilevel orthogonal Bochner function subspaces with applications to robust machine learning

In our approach, we consider the data as instances of a random field within a relevant Bochner space. Our key observation is that the classes can predominantly reside in two distinct subspaces. To uncover the separation between these classes, we employ the Karhunen-Loeve expansion and construct the appropriate subspaces. This allows us to effectively reveal the distinction between the classes. The novel features forming the above bases are constructed by applying a coordinate transformation based on the recent Functional Data Analysis theory for anomaly detection. The associated signal decomposition is an exact hierarchical tensor product expansion with known optimality properties for approximating stochastic processes (random fields) with finite dimensional function spaces. Using a hierarchical finite dimensional expansion of the nominal class, a series of orthogonal nested subspaces is constructed for detecting anomalous signal components. Projection coefficients of input data in these subspaces are then used to train a Machine Learning (ML classifier. However, due to the split of the signal into nominal and anomalous projection components, clearer separation surfaces for the classes arise. In fact we show that with a sufficiently accurate estimation of the covariance structure of the nominal class, a sharp classification can be obtained. This is particularly advantageous for large unbalanced datasets. We demonstrate it on a number of high-dimensional datasets. This approach yields significant increases in accuracy of ML methods compared to using the same ML algorithm with the original feature data. Our tests on the Alzheimer's Disease ADNI dataset shows a dramatic increase in accuracy (from 48% to 89% accuracy). Furthermore, tests using unbalanced semi-synthetic datasets created from the benchmark GCM dataset confirm increased accuracy as the dataset becomes more unbalanced.

[1]  P. Benner,et al.  A weighted subspace exponential kernel for support tensor machines , 2023, ArXiv.

[2]  M. Kon,et al.  Stochastic Functional Analysis and Multilevel Vector Field Anomaly Detection , 2022, ArXiv.

[3]  Vishal M. Patel,et al.  One-Class Classification: A Survey , 2021, ArXiv.

[4]  J. Castrillón-Candás,et al.  Anomaly detection: A functional analysis perspective , 2020, J. Multivar. Anal..

[5]  Peter Benner,et al.  Efficient Structure-preserving Support Tensor Train Machine , 2020, J. Mach. Learn. Res..

[6]  Jiří Damborský,et al.  Machine Learning in Enzyme Engineering , 2019, ACS Catalysis.

[7]  R. Milo,et al.  Revisiting Trade-offs between Rubisco Kinetic Parameters , 2019, Biochemistry.

[8]  Philip S. Yu,et al.  Multi-way Multi-level Kernel Modeling for Neuroimaging Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Helmut Harbrecht,et al.  Analysis of the domain mapping method for elliptic diffusion problems on random domains , 2016, Numerische Mathematik.

[10]  Philip S. Yu,et al.  DuSK: A Dual Structure-preserving Kernel for Supervised Tensor Learning with Applications to Neuroimages , 2014, SDM.

[11]  Petros Xanthopoulos,et al.  A weighted support vector machine method for control chart pattern recognition , 2014, Comput. Ind. Eng..

[12]  P. Kokoszka,et al.  Inference for Functional Data with Applications , 2012 .

[13]  C. Jack,et al.  Alzheimer's Disease Neuroimaging Initiative (ADNI) , 2010, Neurology.

[14]  Christoph Schwab,et al.  Karhunen-Loève approximation of random fields by generalized fast multipole methods , 2006, J. Comput. Phys..

[15]  D. Geman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[16]  Kevin Amaratunga,et al.  Spatially Adapted Multiwavelets and Sparse Representation of Integral Equations on General Geometries , 2002, SIAM J. Sci. Comput..

[17]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Tong Zhang An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[19]  Philip S. Yu,et al.  Kernelized Support Tensor Machines , 2017, ICML.

[20]  Ulrich Stadtmüller,et al.  An Introduction to Functional Data Analysis , 2015 .

[21]  岩坪威 アルツハイマー病の早期診断に向けて-米国 Alzheimer's Disease Neuroimaging Initiative の取り組み , 2006 .

[22]  Kevin Amaratunga,et al.  Generalized hierarchical bases: a Wavelet‐Ritz‐Galerkin framework for Lagrangian FEM , 2005 .

[23]  Kevin Amaratunga,et al.  Fast estimation of continuous Karhunen-Loeve eigenfunctions using wavelets , 2002, IEEE Trans. Signal Process..