Cluster Analysis of Face Images and Literature Data by Evolutionary Distance Metric Learning

Evolutionary distance metric learning (EDML) is an efficient technique for solving clustering problems with some background knowledge. However, EDML has never been applied to real world applications. Thus, we demonstrate EDML for cluster analysis and visualization of two applications, i.e., a face recognition image dataset and a literature dataset. In the facial image clustering, we demonstrate improvement of the cluster validity index and also analyze the distributions of classes (ages) visualized by a self-organizing map and a K-means clustering with K-nearest neighbor centroids graph. For the literature dataset, we have analyzed the topics (i.e., a cluster of articles) that are the most likely to win the best paper award. Application of EDML to these datasets yielded qualitatively promising visualization results that demonstrate the practicability and effectiveness of EDML.

[1]  Masayuki Numao,et al.  Evolutionary multi-objective distance metric learning for multi-label clustering , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[2]  Masayuki Numao,et al.  Neighborhood-Based Smoothing of External Cluster Validity Measures , 2012, PAKDD.

[3]  Tomer Hertz,et al.  Boosting margin based distance functions for clustering , 2004, ICML.

[4]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[5]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[6]  Masayuki Numao,et al.  Evolutionary Distance Metric Learning Approach to Semi-supervised Clustering with Neighbor Relations , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[7]  Janez Brest,et al.  Self-Adapting Control Parameters in Differential Evolution: A Comparative Study on Numerical Benchmark Problems , 2006, IEEE Transactions on Evolutionary Computation.

[8]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Samuel Kaski,et al.  Principle of Learning Metrics for Exploratory Data Analysis , 2004, J. VLSI Signal Process..

[11]  Zhijian Wu,et al.  Parallel differential evolution with self-adapting control parameters and generalized opposition-based learning for solving high-dimensional optimization problems , 2013, J. Parallel Distributed Comput..

[12]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[13]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[14]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .