Chinese-Tibetan bilingual clustering based on random walk

In recent years, multi-source clustering has received a significant amount of attention. Several multi-source clustering methods have been developed from different perspectives. In this paper, aiming at addressing the problem of Chinese-Tibetan bilingual document clustering, a novel bilingual clustering scheme is proposed, which can well capture both the intralingua document structures and interlingua document relations. The proposed scheme consists of three major phases. Firstly, to properly combine the feature structures of documents in different languages, a bilingual graph is constructed. In the second phase, two bilingual similarity matrices are computed based on the random walk performed in the bilingual graph. Finally, the similarity based clustering methods are performed on the two bilingual similarity matrices so as to generate cluster structures for documents in each language respectively, which lead to the corresponding bilingual clustering methods. Extensive experiments conducted on two Chinese-Tibetan bilingual document sets have confirmed the effectiveness of the proposed methods.

[1]  Min-Yen Kan,et al.  Comment-based multi-view clustering of web 2.0 items , 2014, WWW.

[2]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[3]  Neil A. Dodgson,et al.  Proceedings Ninth IEEE International Conference on Computer Vision , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  M. Cugmas,et al.  On comparing partitions , 2015 .

[5]  Martial Hebert,et al.  Source constrained clustering , 2011, 2011 International Conference on Computer Vision.

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[8]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[9]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[10]  Feiping Nie,et al.  Multi-View Clustering and Feature Learning via Structured Sparsity , 2013, ICML.

[11]  Christopher J. C. Burges,et al.  Spectral clustering and transductive learning with multiple views , 2007, ICML '07.

[12]  Cai Zhi Design of a Tibetan Word Segmentation System , 2011 .

[13]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[14]  David J. Aldous,et al.  Lower bounds for covering times for reversible Markov chains and random walks on graphs , 1989 .

[15]  Philip S. Yu,et al.  NEIWalk: Community Discovery in Dynamic Content-Based Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[16]  Shao-Yuan Li,et al.  Partial Multi-View Clustering , 2014, AAAI.

[17]  Xuran Zhao,et al.  A subspace co-training framework for multi-view clustering , 2014, Pattern Recognit. Lett..

[18]  Samuel Kaski,et al.  Infinite factorization of multiple non-parametric views , 2010, Machine Learning.

[19]  Chang-Dong Wang,et al.  Multi-Exemplar Affinity Propagation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Yuhong Guo,et al.  Convex Subspace Representation Learning from Multi-View Data , 2013, AAAI.

[21]  Wei Tang,et al.  Clustering with Multiple Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[22]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[23]  Jieping Ye,et al.  Multi-objective Multi-view Spectral Clustering via Pareto Optimization , 2013, SDM.

[24]  Lei Du,et al.  Robust Multi-View Spectral Clustering via Low-Rank and Sparse Decomposition , 2014, AAAI.

[25]  Cai Zang-tai Research of Banzhida Chinese-Tibetan Document Translation System Based on the Dichotomy of Syntax Analysis , 2005 .

[26]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[27]  Eric Eaton,et al.  Multi-view constrained clustering with an incomplete mapping between views , 2012, Knowledge and Information Systems.

[28]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[29]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[30]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[31]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[32]  Delbert Dueck,et al.  Affinity Propagation: Clustering Data by Passing Messages , 2009 .

[33]  E. Giannopoulos,et al.  Multiple source clustering: a probabilistic reasoning approach , 1996, Proceeding of 1st Australian Data Fusion Symposium.

[34]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[35]  Ian Davidson,et al.  Improving document clustering using automated machine translation , 2012, CIKM '12.

[36]  Massih-Reza Amini,et al.  Multi-view clustering of multilingual documents , 2010, SIGIR.

[37]  Brendan J. Frey,et al.  A Binary Variable Model for Affinity Propagation , 2009, Neural Computation.

[38]  Yuji Matsumoto,et al.  Chinese Word Segmentation by Classification of Characters , 2005, Int. J. Comput. Linguistics Chin. Lang. Process..