论文信息 - Feature learning using Generalized Extreme Value distribution based K-means clustering

Feature learning using Generalized Extreme Value distribution based K-means clustering

Recent studies have shown that K-means, with larger K, can effectively learn local image patch features; accompanied with appropriate pooling strategies, it performs very well in many visual object recognition tasks. An improved K-means cluster algorithm, GEV-Kmeans, based on the Generalized Extreme Value (GEV) distribution, is proposed in this paper. Our key observation is that the squared distance of a point to its closest center adheres to the Generalized Extreme Value (GEV) distribution when the number of clusters is large. Differing from the K-means algorithm, we minimize the reconstruction errors by ignoring those points with lower GEV probabilities (i.e. rare events), and focus on others points which might be more critical in characterizing the underlying data distribution. Consequently, our algorithm can handle outliers very well. Experimental results demonstrate the effectiveness of our algorithm.

Zeyu Li | Ruzena Bajcsy | Oriol Vinyals | Harlyn Baker

[1] Arnold W. M. Smeulders,et al. The Distribution Family of Similarity Distances , 2007, NIPS.

[2] Anderson Rocha,et al. Robust Fusion: Extreme Value Theory for Recognition Score Normalization , 2010, ECCV.

[3] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[4] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[5] Alain Trémeau,et al. Extreme value theory based text binarization in documents and natural scenes , 2010 .

[6] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[7] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[8] George C. Tseng,et al. Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data , 2007, Bioinform..