A general regression technique for learning transductions

The problem of learning a transduction, that is a string-to-string mapping, is a common problem arising in natural language processing and computational biology. Previous methods proposed for learning such mappings are based on classification techniques. This paper presents a new and general regression technique for learning transductions and reports the results of experiments showing its effectiveness. Our transduction learning consists of two phases: the estimation of a set of regression coefficients and the computation of the pre-image corresponding to this set of coefficients. A novel and conceptually cleaner formulation of kernel dependency estimation provides a simple framework for estimating the regression coefficients, and an efficient algorithm for computing the pre-image from the regression coefficients extends the applicability of kernel dependency estimation to output sequences. We report the results of a series of experiments illustrating the application of our regression technique for learning transductions.

[1]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[2]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[3]  Aryeh Kontorovich Uniquely decodable n-gram embeddings , 2004, Theor. Comput. Sci..

[4]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[5]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[6]  Bernhard Schölkopf,et al.  Kernel Dependency Estimation , 2002, NIPS.

[7]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[8]  Bernhard Schölkopf,et al.  Joint Kernel Maps , 2005, IWANN.

[9]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Ben Taskar,et al.  Exponentiated Gradient Algorithms for Large-margin Structured Classification , 2004, NIPS.

[11]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[12]  Arto Salomaa,et al.  Automata-Theoretic Aspects of Formal Power Series , 1978, Texts and Monographs in Computer Science.

[13]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[14]  Mehryar Mohri,et al.  Rational Kernels: Theory and Algorithms , 2004, J. Mach. Learn. Res..

[15]  Arto Salomaa,et al.  Semirings, Automata, Languages , 1985, EATCS Monographs on Theoretical Computer Science.

[16]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[17]  Robin J. Wilson Introduction to Graph Theory , 1974 .

[18]  Arto Salomaa,et al.  Semirings, Automata and Languages , 1985 .

[19]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[20]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.