Grid-Based Fuzzy Processing for Parallel Learning the Document Similarities

Document co-clustering methods allow to efficiently capture high-order similarities between objects described by rows and columns of a data matrix. In Alouane et al. (2013), a method for simultaneous computation of similarity matrices between objects (documents or sentences) and between descriptors (sentences or words), each one being built on the other one, according to a fuzzy triadic model based on the three-partite graph. Because of the development of the Web and the high availability of storage spaces, documents become more accessible. This makes the fuzzy computing very expensive. In the present case, the development of fuzzification algorithms of fuzzification requires the integration of a deployment platform with the required processing power. The choice of a grid architecture seems to be an appropriate answer to our needs since it allows us to distribute the processing over all the machines of the platform, thus creating the illusion of a virtual computer able to solve important computing problems which require very long run times in a single machine environment. The authors propose to enhance similarity by upstream and downstream parallel processing. The first deploys the fuzzy linear model in a Grid environment. The second deals with multi-view datasets while introducing different architectures by using several instances of a fuzzy triadic similarity algorithm.

[1]  Ahmad Taher Azar,et al.  Overview of Type-2 Fuzzy Logic Systems , 2012, Int. J. Fuzzy Syst. Appl..

[2]  Sukhamay Kundu,et al.  Min-transitivity of fuzzy leftness relationship and its application to decision making , 1997, Fuzzy Sets Syst..

[3]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[4]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[5]  Lotfi A. Zadeh,et al.  A Simple View of the Dempster-Shafer Theory of Evidence and Its Implication for the Rule of Combination , 1985, AI Mag..

[6]  Yves Lechevallier,et al.  Partitioning hard clustering algorithms based on multiple dissimilarity matrices , 2012, Pattern Recognit..

[7]  Gilles Bisson,et al.  Co-clustering of Multi-view Datasets: A Parallelizable Approach , 2012, 2012 IEEE 12th International Conference on Data Mining.

[8]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[9]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[10]  Gilles Bisson,et al.  Chi-Sim: A New Similarity Measure for the Co-clustering Task , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[11]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[12]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[13]  Gilles Bisson,et al.  An Architecture to Efficiently Learn Co-Similarities from Multi-view Datasets , 2012, ICONIP.

[14]  Ahmad Taher,et al.  Adaptive Neuro-Fuzzy Systems , 2010 .

[15]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[16]  Philip S. Yu,et al.  Unsupervised learning on k-partite graphs , 2006, KDD '06.

[17]  Xiaotie Deng,et al.  Efficient Phrase-Based Document Similarity for Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[18]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.