A feature-free and parameter-light multi-task clustering framework

The two last decades have witnessed extensive research on multi-task learning algorithms in diverse domains such as bioinformatics, text mining, natural language processing as well as image and video content analysis. However, all existing multi-task learning methods require either domain-specific knowledge to extract features or a careful setting of many input parameters. There are many disadvantages associated with prior knowledge requirements for feature extraction or parameter-laden approaches. One of the most obvious problems is that we may find a wrong or non-existent pattern because of poorly extracted features or incorrectly set parameters. In this work, we propose a feature-free and parameter-light multi-task clustering framework to overcome these disadvantages. Our proposal is motivated by the recent successes of Kolmogorov-based methods on various applications. However, such methods are only defined for single-task problems because they lack a mechanism to share knowledge between different tasks. To address this problem, we create a novel dictionary-based compression dissimilarity measure that allows us to share knowledge across different tasks effectively. Experimental results with extensive comparisons demonstrate the generality and the effectiveness of our proposal.

[1]  Dimitrios I. Fotiadis,et al.  An optimized sequential pattern matching methodology for sequence classification , 2009, Knowledge and Information Systems.

[2]  Ashish Verma,et al.  Cross-Guided Clustering: Transfer of Relevant Supervision across Domains for Improved Clustering , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[3]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[4]  Seiichi Ozawa,et al.  A Multitask Learning Model for Online Pattern Recognition , 2009, IEEE Transactions on Neural Networks.

[5]  Szymon Grabowski,et al.  Revisiting dictionary‐based compression , 2005, Softw. Pract. Exp..

[6]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[7]  Ming Li,et al.  Normalized Information Distance , 2008, ArXiv.

[8]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  Ronald de Wolf,et al.  Algorithmic Clustering of Music Based on String Compression , 2004, Computer Music Journal.

[11]  Ya Zhang,et al.  Multi-task learning for boosting with application to web search ranking , 2010, KDD.

[12]  Wei Liu,et al.  Extending Semi-supervised Learning Methods for Inductive Transfer Learning , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[13]  Xin Chen,et al.  A compression algorithm for DNA sequences and its applications in genome comparison , 2000, RECOMB '00.

[14]  Quanquan Gu,et al.  Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[15]  Eamonn J. Keogh,et al.  A Compression Based Distance Measure for Texture , 2010, SDM.

[16]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[17]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[18]  M. M. Hassan Mahmud On Universal Transfer Learning , 2007, ALT.

[19]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[20]  Michael K. Ng,et al.  Knowledge-based vector space model for text clustering , 2010, Knowledge and Information Systems.

[21]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[22]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[23]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[24]  Qiang Yang,et al.  Transferring Multi-device Localization Models using Latent Multi-task Learning , 2008, AAAI.

[25]  Hui Li,et al.  Semisupervised Multitask Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  M. M. Hassan Mahmud,et al.  Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations , 2007, NIPS.

[27]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[28]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[29]  Jiawei Han,et al.  Learning a Kernel for Multi-Task Clustering , 2011, AAAI.

[30]  Jianwen Zhang,et al.  Multitask Bregman clustering , 2010, Neurocomputing.

[31]  Vittorio Loreto,et al.  Language trees and zipping. , 2002, Physical review letters.

[32]  Anton Schwaighofer,et al.  Learning Gaussian Process Kernels via Hierarchical Bayes , 2004, NIPS.

[33]  Yiming Yang,et al.  Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[34]  Ming Li,et al.  Clustering by compression , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[35]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[36]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[37]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[38]  Elena Baralis,et al.  Measuring gene similarity by means of the classification distance , 2011, Knowledge and Information Systems.

[39]  Qiang Yang,et al.  Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study , 2010, BMC Bioinformatics.

[40]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[41]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[42]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[43]  Brendan Juba,et al.  Estimating relatedness via data compression , 2006, ICML.

[44]  H. Hirsh,et al.  DNA Sequence Classification Using Compression-Based Induction , 1995 .

[45]  Wei-Ying Ma,et al.  Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes , 2002, UAI.

[46]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[47]  Forrest W. Young,et al.  Introduction to Multidimensional Scaling: Theory, Methods, and Applications , 1981 .

[48]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[49]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[50]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[51]  Trevor I. Dix,et al.  Sequence Complexity for Biological Sequence Analysis , 2000, Comput. Chem..

[52]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[53]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[54]  Tuan D. Pham,et al.  A probabilistic measure for alignment-free sequence comparison , 2004, Bioinform..

[55]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[56]  Deepak Agarwal,et al.  Detecting anomalies in cross-classified streams: a Bayesian approach , 2006, Knowledge and Information Systems.

[57]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[58]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.