Locality kernels for sequential data and their applications to parse ranking

We propose a framework for constructing kernels that take advantage of local correlations in sequential data. The kernels designed using the proposed framework measure parse similarities locally, within a small window constructed around each matching feature. Furthermore, we propose to incorporate positional information inside the window and consider different ways to do this. We applied the kernels together with regularized least-squares (RLS) algorithm to the task of dependency parse ranking using the dataset containing parses obtained from a manually annotated biomedical corpus of 1100 sentences. Our experiments show that RLS with kernels incorporating positional information perform better than RLS with the baseline kernel functions. This performance gain is statistically significant.

[1]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[2]  Tapio Salakoski,et al.  IMPROVING THE PERFORMANCE OF BAYESIAN AND SUPPORT VECTOR CLASSIFIERS IN WORD SENSE DISAMBIGUATION USING POSITIONAL INFORMATION , 2005 .

[3]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[4]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[5]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[6]  Alessandro Moschitti,et al.  Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[7]  Tapio Salakoski,et al.  Regularized Least-Squares for Parse Ranking , 2005, IDA.

[8]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[9]  Tapio Salakoski,et al.  Fast n-Fold Cross-Validation for Regularized Least-Squares , 2006 .

[10]  Tapio Salakoski,et al.  Kernels Incorporating Word Positional Information in Natural Language Disambiguation Tasks , 2005, FLAIRS.

[11]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[12]  Adam Kilgarriff,et al.  of the European Chapter of the Association for Computational Linguistics , 2006 .

[13]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[14]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[15]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[16]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[17]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[18]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[19]  M. Kendall Rank Correlation Methods , 1949 .

[20]  Tapio Salakoski,et al.  Graph Kernels versus Graph Representations : a Case Study in Parse Ranking , 2006 .

[21]  Jun Suzuki,et al.  Convolution Kernels with Feature Selection for Natural Language Processing Tasks , 2004, ACL.

[22]  Tapio Salakoski,et al.  Incorporating External Information in Bayesian Classifiers Via Linear Feature Transformations , 2006, FinTAL.

[23]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[24]  J. M. Bevan,et al.  Rank Correlation Methods , 1949 .

[25]  Tapio Salakoski,et al.  Analysis of Link Grammar on Biomedical Dependency Corpus Targeted at Protein-Protein Interactions , 2004, NLPBA/BioNLP.

[26]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[27]  Tapio Salakoski,et al.  Locality-Convolution Kernel and Its Application to Dependency Parse Ranking , 2006, IEA/AIE.

[28]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.