Efficient Feature Construction by Meta Learning – Guiding the Search in Meta Hypothesis Space

Choosing the right internal representation of examples and hypothesis is a key issue for many learning problems. Feature construction is an approach to find such a representation independently of the underlying learning algorithm. Unfortunately, the construction of features usually implies searching a very large space of possibilities and is often computationally demanding. In this work, we propose an approach to feature construction that is based on Meta Learning. Learning tasks are stored together with a corresponding set of constructed features in a case base. This case base is then used to constraint and guide the feature construction for new tasks. Our proposed method consists essentially of a new representation model for learning tasks and a corresponding two step distance measure. Our approach is unique as it enables us to apply case based feature construction not only on a large scale, but also in distributed learning scenarios in which communication cost plays an important role. Using the two step process, the accuracy of recommendations can be increased while not loosing the benefits of efficiency. The theoretical results are also confirmed by experiments on both synthetical data and data obtained from a distributed learning scenario on audio data.

[1]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[2]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[3]  Hillol Kargupta,et al.  Distributed Data Mining: Algorithms, Systems, and Applications , 2003 .

[4]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[5]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[6]  Ingo Mierswa,et al.  A Hybrid Approach to Feature Selection and Generation Using an Evolutionary Algorithm , 2003 .

[7]  Kaizhong Zhang,et al.  On the Editing Distance between Undirected Acyclic Graphs and Related Problems , 1995, CPM.

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[10]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[11]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[12]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[13]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[14]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[15]  Katharina Morik,et al.  Automatic Feature Extraction for Classifying Audio Data , 2005, Machine Learning.

[16]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[17]  Ingo Mierswa,et al.  Efficient Case Based Feature Construction for Heterogeneous Learning Tasks , 2006 .

[18]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.