Rule Ensemble Learning Using Hierarchical Kernels in Structured Output Spaces

The goal in Rule Ensemble Learning (REL) is simultaneous discovery of a small set of simple rules and their optimal weights that lead to good generalization. Rules are assumed to be conjunctions of basic propositions concerning the values taken by the input features. It has been shown that rule ensembles for classification can be learnt optimally and efficiently using hierarchical kernel learning approaches that explore the exponentially large space of conjunctions by exploiting its hierarchical structure. The regularizer employed penalizes large features and thereby selects a small set of short features. In this paper, we generalize the rule ensemble learning using hierarchical kernels (RELHKL) framework to multi class structured output spaces. We build on the StructSVM model for sequence prediction problems and employ a ρ-norm hierarchical regularizer for observation features and a conventional 2-norm regularizer for state transition features. The exponentially large feature space is searched using an active set algorithm and the exponentially large set of constraints are handled using a cutting plane algorithm. The approach can be easily extended to other structured output problems. We perform experiments on activity recognition datasets which are prone to noise, sparseness and skewness. We demonstrate that our approach outperforms other approaches.

[1]  Ganesh Ramakrishnan,et al.  Enhancing Activity Recognition in Smart Homes Using Feature Induction , 2011, DaWaK.

[2]  Yves Grandvalet,et al.  Composite kernel learning , 2008, ICML '08.

[3]  Christopher G. Atkeson,et al.  Assistive intelligent environments for automatic health monitoring , 2005 .

[4]  R. Rivest Learning Decision Lists , 1987, Machine Learning.

[5]  Emmanuel,et al.  Activity recognition in the home setting using simple and ubiquitous sensors , 2003 .

[6]  M. Kloft,et al.  Efficient and Accurate ` p-Norm Multiple Kernel Learning , 2009 .

[7]  Kent Larson,et al.  Activity Recognition in the Home Using Simple and Ubiquitous Sensors , 2004, Pervasive.

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[10]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[11]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[12]  Francis R. Bach,et al.  High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning , 2009, ArXiv.

[13]  Ben Kröse,et al.  Monitoring homes with wireless sensor networks , 2008 .

[14]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[15]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[16]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[18]  Ganesh Ramakrishnan,et al.  Efficient Rule Ensemble Learning using Hierarchical Kernels , 2011, ICML.

[19]  Gwenn Englebienne,et al.  Accurate activity recognition in a home setting , 2008, UbiComp.

[20]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.