Parallel architectures for fuzzy triadic similarity learning

In a context of document co-clustering, we define a new similarity measure which iteratively computes similarity while combining fuzzy sets in a three-partite graph. The fuzzy triadic similarity (FT-Sim) model can deal with uncertainty offers by the fuzzy sets. Moreover, with the development of the Web and the high availability of storage spaces, more and more documents become accessible. Documents can be provided from multiple sites and make similarity computation an expensive processing. This problem motivated us to use parallel computing. In this paper, we introduce parallel architectures which are able to treat large and multi-source data sets by a sequential, a merging or a splitting-based process. Then, we proceed to a local and a central (or global) computing using the basic FT-Sim measure. The idea behind these architectures is to reduce both time and space complexities thanks to parallel computation.

[1]  Xiaotie Deng,et al.  Efficient Phrase-Based Document Similarity for Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[2]  Sukhamay Kundu,et al.  Min-transitivity of fuzzy leftness relationship and its application to decision making , 1997, Fuzzy Sets Syst..

[3]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[4]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[5]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[6]  Gilles Bisson,et al.  An Architecture to Efficiently Learn Co-Similarities from Multi-view Datasets , 2012, ICONIP.

[7]  Gilles Bisson,et al.  Co-clustering of Multi-view Datasets: A Parallelizable Approach , 2012, 2012 IEEE 12th International Conference on Data Mining.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Philip S. Yu,et al.  Unsupervised learning on k-partite graphs , 2006, KDD '06.

[10]  Wei Tang,et al.  Clustering with Multiple Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[11]  Steffen Bickel,et al.  Discovering Communities in Linked Data by Multi-view Clustering , 2005, GfKl.

[12]  Yves Lechevallier,et al.  Partitioning hard clustering algorithms based on multiple dissimilarity matrices , 2012, Pattern Recognit..

[13]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[14]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .