论文信息 - A Compression-Based Dissimilarity Measure for Multi-task Clustering

A Compression-Based Dissimilarity Measure for Multi-task Clustering

Virtually all existing multi-task learning methods for string data require either domain specific knowledge to extract feature representations or a careful setting of many input parameters. In this work, we propose a feature-free and parameter-light multi-task clustering algorithm for string data. To transfer knowledge between different domains, a novel dictionary-based compression dissimilarity measure is proposed. Experimental results with extensive comparisons demonstrate the generality and the effectiveness of our proposal.

Thach Huy Nguyen | Hao Shao | Einoshin Suzuki | Bin Tong

[1] Ming Li,et al. Normalized Information Distance , 2008, ArXiv.

[2] Eamonn J. Keogh,et al. Towards parameter-free data mining , 2004, KDD.

[3] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[4] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .

[5] Quanquan Gu,et al. Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[6] Ashish Verma,et al. Cross-Guided Clustering: Transfer of Relevant Supervision across Domains for Improved Clustering , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[7] Inderjit S. Dhillon,et al. Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[8] M. M. Hassan Mahmud. On Universal Transfer Learning , 2007, ALT.

[9] Jianwen Zhang,et al. Multitask Bregman clustering , 2010, Neurocomputing.

[10] Jiawei Han,et al. Non-negative Matrix Factorization on Manifold , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11] Hui Li,et al. Semisupervised Multitask Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] M. M. Hassan Mahmud,et al. Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations , 2007, NIPS.

[13] Terry A. Welch,et al. A Technique for High-Performance Data Compression , 1984, Computer.

[14] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[15] Anton Schwaighofer,et al. Learning Gaussian Process Kernels via Hierarchical Bayes , 2004, NIPS.

[16] Naftali Tishby,et al. Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[17] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[18] Brendan Juba,et al. Estimating relatedness via data compression , 2006, ICML.

[19] William I. Gasarch,et al. Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.