Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining

The notion of meta-mining has appeared recently and extends traditional meta-learning in two ways. First it provides support for the whole data-mining process. Second it pries open the so called algorithm black-box approach where algorithms and workflows also have descriptors. With the availability of descriptors both for datasets and data-mining workflows we are faced with a problem the nature of which is much more similar to those appearing in recommendation systems. In order to account for the meta-mining specificities we derive a novel metric-based-learning recommender approach. Our method learns two homogeneous metrics, one in the dataset and one in the workflow space, and a heterogeneous one in the dataset-workflow space. All learned metrics reflect similarities established from the dataset-workflow preference matrix. The latter is constructed from the performance results obtained by the application of workflows to datasets. We demonstrate our method on meta-mining over biological (microarray datasets) problems. The application of our method is not limited to the meta-mining problem, its formulation is general enough so that it can be applied on problems with similar requirements.

[1]  Melanie Hilario,et al.  Fusion of Meta-knowledge and Meta-data for Case-Based Model Selection , 2001, PKDD.

[2]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[3]  Charles C. Taylor,et al.  Meta-Analysis: From Data Characterisation for Meta-Learning to Meta-Regression , 2000 .

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  Cao Feng,et al.  STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[7]  Thore Graepel,et al.  Matchbox: large scale online bayesian recommendations , 2009, WWW '09.

[8]  Deepak Agarwal,et al.  fLDA: matrix factorization through latent dirichlet allocation , 2010, WSDM '10.

[9]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[10]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Ricardo Vilalta,et al.  Introduction to the Special Issue on Meta-Learning , 2004, Machine Learning.

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[15]  Robin D. Burke,et al.  Hybrid Recommender Systems: Survey and Experiments , 2002, User Modeling and User-Adapted Interaction.

[16]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[17]  João Gama,et al.  On Data and Algorithms: Understanding Inductive Performance , 2004, Machine Learning.

[18]  Ricardo Vilalta,et al.  Using Meta-Learning to Support Data Mining , 2004, Int. J. Comput. Sci. Appl..

[19]  Kate Smith-Miles,et al.  Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[20]  T. Ho,et al.  Data Complexity in Pattern Recognition , 2006 .

[21]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[22]  Carlos Soares,et al.  Zoomed Ranking: Selection of Classification Algorithms Based on Relevant Performance Information , 2000, PKDD.

[23]  Melanie Hilario,et al.  Ontology-Based Meta-Mining of Knowledge Discovery Workflows , 2011, Meta-Learning in Computational Intelligence.

[24]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[25]  Alexandros Kalousis,et al.  NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection , 1999, Intell. Data Anal..

[26]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[27]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[28]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[29]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.

[30]  Tim Oates,et al.  A Review of Recent Research in Metareasoning and Metalearning , 2007, AI Mag..

[31]  Deepak Agarwal,et al.  Regression-based latent factor models , 2009, KDD.