Spectral Non-Convex Optimization for Dimension Reduction with Hilbert-Schmidt Independence Criterion

The Hilbert Schmidt Independence Criterion (HSIC) is a kernel dependence measure that has applications in various aspects of machine learning. Conveniently, the objectives of different dimensionality reduction applications using HSIC often reduce to the same optimization problem. However, the nonconvexity of the objective function arising from non-linear kernels poses a serious challenge to optimization efficiency and limits the potential of HSIC-based formulations. As a result, only linear kernels have been computationally tractable in practice. This paper proposes a spectral-based optimization algorithm that extends beyond the linear kernel. The algorithm identifies a family of suitable kernels and provides the first and second-order local guarantees when a fixed point is reached. Furthermore, we propose a principled initialization strategy, thereby removing the need to repeat the algorithm at random initialization points. Compared to state-of-the-art optimization algorithms, our empirical results on real data show a run-time improvement by as much as a factor of $10^5$ while consistently achieving lower cost and classification/clustering errors. The implementation source code is publicly available on this https URL.

[1]  Michael I. Jordan,et al.  Multiple Non-Redundant Spectral Clustering Views , 2010, ICML.

[2]  Stephen D. Bay,et al.  The UCI KDD archive of large data sets for data mining research and experimentation , 2000, SKDD.

[3]  Fakhri Karray,et al.  Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion , 2016, ICIAR.

[4]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[5]  Shotaro Akaho,et al.  Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold , 2005, Neurocomputing.

[6]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[7]  Michael I. Jordan,et al.  Iterative Discovery of Multiple AlternativeClustering Views , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Pierre-Antoine Absil,et al.  RTRMC: A Riemannian trust-region method for low-rank matrix completion , 2011, NIPS.

[9]  Zohreh Azimifar,et al.  Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds , 2011, Pattern Recognit..

[10]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[11]  Xiangyu Li,et al.  Iterative Spectral Method for Alternative Clustering , 2018, AISTATS.

[12]  Fabian J. Theis,et al.  Soft Dimension Reduction for ICA by Joint Diagonalization on the Stiefel Manifold , 2009, ICA.

[13]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[14]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[15]  Michael I. Jordan,et al.  Dimensionality Reduction for Spectral Clustering , 2011, AISTATS.

[16]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[17]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[18]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[19]  Peizhen Zhu,et al.  Principal angles between subspaces and their tangents , 2012 .

[20]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[21]  Jennifer G. Dy,et al.  Clustering with Domain-Specific Usefulness Scores , 2017, SDM.

[22]  Jennifer G. Dy,et al.  From Transformation-Based Dimensionality Reduction to Feature Selection , 2010, ICML.

[23]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..