Boosting Relational Sequence Alignments

The task of aligning sequences arises in many applications. Classical dynamic programming approaches require the explicit state enumeration in the reward model. This is often impractical: the number of states grows very quickly with the number of domain objects and relations among these objects. Relational sequence alignment aims at exploiting symbolic structure to avoid the full enumeration. This comes at the expense of a more complex reward model selection problem: virtually infinitely many abstraction levels have to be explored. In this paper, we apply gradient-based boosting to leverage this problem. Specifically, we show how to reduce the learning problem to a series of relational regressions problems. The main benefit of this is that interactions between states variables are introduced only as needed, so that the potentially infinite search space is not explicitly considered. As our experimental results show, this boosting approach can significantly improve upon established results in challenging applications.

[1]  Jan Ramon,et al.  Clustering and instance based learning in first order logic , 2002, AI Communications.

[2]  N. Jacobs Relational Sequence Learning and User Modelling , 2004 .

[3]  Jan Ramon Thesis: clustering and instance based learning in first order logic , 2002 .

[4]  Shan-Hwei Nienhuys-Cheng,et al.  Distance Between Herbrand Interpretations: A Measure for Approximations to a Target Concept , 1997, ILP.

[5]  Alain Ketterlin,et al.  Clustering Sequences of Complex Objects , 1997, KDD.

[6]  Eyke Hüllermeier,et al.  Graph Alignments: A New Concept to Detect Conserved Regions in Protein Active Sites , 2004, German Conference on Bioinformatics.

[7]  Alan Fern,et al.  Gradient Boosting for Sequence Alignment , 2006, AAAI.

[8]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[9]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 1999, Nucleic Acids Res..

[10]  Sean R. Eddy,et al.  Biological sequence analysis: Preface , 1998 .

[11]  Yasubumi Sakakibara,et al.  RNA secondary structural alignment with conditional random fields , 2005, ECCB/JBI.

[12]  Luc De Raedt,et al.  Constraint Based Mining of First Order Sequences in SeqLog , 2004, Database Support for Data Mining Applications.

[13]  Gerhard Widmer,et al.  Relational IBL in classical music , 2006, Machine Learning.

[14]  Luc De Raedt,et al.  Top-down induction of logical decision trees , 1997 .

[15]  Lusheng Wang,et al.  Alignment of trees: an alternative to tree edit , 1995 .

[16]  Luc De Raedt,et al.  r-grams: Relational Grams , 2007, IJCAI.

[17]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[18]  Thomas Gärtner,et al.  Fisher Kernels for Logical Sequences , 2004, ECML.

[19]  L. De Raedt,et al.  Logical Hidden Markov Models , 2011, J. Artif. Intell. Res..

[20]  Kristian Kersting,et al.  TildeCRF: Conditional Random Fields for Logical Sequences , 2006, ECML.

[21]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[22]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .