A semi-supervised fuzzy co-clustering framework and application to twitter data analysis

Semi-supervised clustering is an efficient scheme for utilizing data with partial class information, where unsupervised data distributions are estimated under some supports of partial supervised class information. In this paper, a novel framework for performing fuzzy co-clustering of cooccurrence information with partial supervision is proposed, which is induced by multinomial mixture concept. Co-clustering is useful for extracting object-item pair-wise clusters from cooccurrence information and has been utilized in various applications such as document-keyword analysis and customer-products purchase history data analysis. Several experimental results including a twitter data analysis demonstrate the ability of improving the classification quality of the fuzzified co-cluster structural knowledge. Then, the proposed semi-supervised framework is expected to be a powerful tool in Big Data analysis with huge volumes of data but partial supervisions only.

[1]  Hidetomo Ichihashi,et al.  Regularized linear fuzzy clustering and probabilistic PCA mixture models , 2005, IEEE Transactions on Fuzzy Systems.

[2]  François Yvon,et al.  Inference and evaluation of the multinomial mixture model for text clustering , 2006, Inf. Process. Manag..

[3]  Hidetomo Ichihashi,et al.  Fuzzy clustering for categorical multivariate data , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[4]  Nozha Boujemaa,et al.  Fuzzy Clustering with Pairwise Constraints for Knowledge-Driven Image Categorization , 2004, EWIMT.

[5]  Hidetomo Ichihashi,et al.  FCM-type Cluster Validation in Fuzzy Co-Clustering and Collaborative Filtering Applicability , 2013 .

[6]  Katsuhiro Honda,et al.  Exclusive condition on item partition in fuzzy co-clustering based on K-L information regularization , 2014, 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS).

[7]  Chien-Liang Liu,et al.  Clustering documents with labeled and unlabeled documents using fuzzy semi-Kmeans , 2013, Fuzzy Sets Syst..

[8]  Hidetomo Ichihashi,et al.  Fuzzy PCA-Guided Robust $k$-Means Clustering , 2010, IEEE Transactions on Fuzzy Systems.

[9]  Nizar Grira,et al.  Unsupervised and Semi-supervised Clustering : a Brief Survey ∗ , 2004 .

[10]  Raghu Krishnapuram,et al.  Fuzzy co-clustering of documents and keywords , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[11]  Geoffrey C. Fox,et al.  A deterministic annealing approach to clustering , 1990, Pattern Recognit. Lett..

[12]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[13]  Sadaaki Miyamoto,et al.  Algorithms of crisp, fuzzy, and probabilistic clustering with semi-supervision or pairwise constraints , 2013, 2013 IEEE International Conference on Granular Computing (GrC).

[14]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[15]  Katsuhiro Honda,et al.  FCM-type fuzzy co-clustering by K-L information regularization , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[16]  Hidetomo Ichihashi,et al.  Fuzzy c-means clustering with regularization by K-L information , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[17]  Hidetomo Ichihashi,et al.  Collaborative filtering by sequential user-item co-cluster extraction from rectangular relational data , 2010, Int. J. Knowl. Eng. Soft Data Paradigms.

[18]  J. Suguna,et al.  Ensemble Fuzzy Clustering for Mixed Numeric and Categorical Data , 2012 .