Neural Networks for Efficient Nonlinear Online Clustering

Unsupervised learning techniques, such as clustering and sparse coding, have been adapted for use with data sets exhibiting nonlinear relationships through the use of kernel machines. These techniques often require an explicit computation of the kernel matrix, which becomes expensive as the number of inputs grows, making it unsuitable for efficient online learning. This paper proposes an algorithm and a neural architecture for online approximated nonlinear kernel clustering using any shift-invariant kernel. The novel model outperforms traditional low-rank kernel approximation based clustering methods, it also requires significantly lower memory requirements than those of popular kernel k-means while showing competitive performance on large data sets.

[1]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[2]  Rong Jin,et al.  Efficient Kernel Clustering Using Random Fourier Features , 2012, 2012 IEEE 12th International Conference on Data Mining.

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  W. Rudin,et al.  Fourier Analysis on Groups. , 1965 .

[5]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[6]  Sanjiv Kumar,et al.  Spherical Random Features for Polynomial Kernels , 2015, NIPS.

[7]  Denis J. Dean,et al.  Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[8]  Andrea Soltoggio,et al.  Online Representation Learning with Single and Multi-layer Hebbian Networks for Image Classification , 2017, ICANN.

[9]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[10]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[11]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[12]  Rong Jin,et al.  Approximate kernel k-means: solution to large scale kernel clustering , 2011, KDD.

[13]  Mark D. Plumbley A Hebbian/anti-Hebbian network which optimizes information capacity by orthonormalizing the principal subspace , 1993 .

[14]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[15]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[16]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[17]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[18]  Dmitri B. Chklovskii,et al.  A Hebbian/Anti-Hebbian network derived from online non-negative matrix factorization can cluster and discover sparse features , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[19]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[20]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.