论文信息 - Weighted Multi-View Possibilistic C-Means Clustering with L2 Regularization

Weighted Multi-View Possibilistic C-Means Clustering with L2 Regularization

Since social media, virtual communities and networks rapidly grow, multi-view data become more popular. In general, multi-view data always contain different feature components in different views. Although these data are extracted in different ways (views) from diverse settings and domains, they are used to describe the same samples which make them highly related. Hence, applying (single-view) clustering methods for multi-view data poses difficulty in achieving desirable clustering results. Thus, multi-view clustering methods should be developed that will utilize available multi-view information. Most of multi-view clustering techniques currently use k-means due to its conceptual simplicity, and use fuzzy c-means (FCM) that the datapoints can belong to more than one cluster based on their membership degrees from 0 to 1. However, the use of k-means or FCM may degrade its performance due to the presence of noise and outliers, especially on large or high-dimensional datasets. The constraint imposed on the membership degrees of k-means and FCM tends to assign a corresponding high membership value to an outlier or a noisy data point. To address these drawbacks, possibilistic c-means (PCM) relaxes the membership constraint of k-means and FCM so that outliers and noisy datapoints can be properly identified. On the other hand, there are various extensions of k-means and FCM for multi-view data, but no extension of PCM for multi-view data was made in the literature. Thus, we use PCM in our proposed multi-view clustering model. In this paper, we propose novel weighted multi-view PCM algorithms designed for clustering multi-view data as well as view and feature weights on PCM approaches, called W-MV-PCM and W-MV-PCM with L2 regularization (W-MV-PCM-L2). In multi-view clustering, different views may vary with respect to its importance and each view may contain some irrelevant features. In the proposed algorithms, a learning schema is constructed to compute for the view weights, and feature weights within each view. This schema will be able to identify the importance of each view and, at the same time, it will also identify and select relevant features in each view. Comparisons of W-MV-PCM-L2 with existing multi-view clustering algorithms are made on both synthetic and real datasets. The experimental results are evaluated using accuracy rate (AR) and external validity indexes, such as Rand index (RI) and normalized mutual information (NMI). The proposed W-MV-PCM-L2 algorithm with comparisons of existing algorithms under criteria of AR, RI and NMI shows that it is a feasible and effective multi-view clustering algorithm.

Miin-Shen Yang | Josephine B.M. Benjamin