Feature learning using Generalized Extreme Value distribution based K-means clustering

Recent studies have shown that K-means, with larger K, can effectively learn local image patch features; accompanied with appropriate pooling strategies, it performs very well in many visual object recognition tasks. An improved K-means cluster algorithm, GEV-Kmeans, based on the Generalized Extreme Value (GEV) distribution, is proposed in this paper. Our key observation is that the squared distance of a point to its closest center adheres to the Generalized Extreme Value (GEV) distribution when the number of clusters is large. Differing from the K-means algorithm, we minimize the reconstruction errors by ignoring those points with lower GEV probabilities (i.e. rare events), and focus on others points which might be more critical in characterizing the underlying data distribution. Consequently, our algorithm can handle outliers very well. Experimental results demonstrate the effectiveness of our algorithm.