论文信息 - A general regression technique for learning transductions

A general regression technique for learning transductions

The problem of learning a transduction, that is a string-to-string mapping, is a common problem arising in natural language processing and computational biology. Previous methods proposed for learning such mappings are based on classification techniques. This paper presents a new and general regression technique for learning transductions and reports the results of experiments showing its effectiveness. Our transduction learning consists of two phases: the estimation of a set of regression coefficients and the computation of the pre-image corresponding to this set of coefficients. A novel and conceptually cleaner formulation of kernel dependency estimation provides a simple framework for estimating the regression coefficients, and an efficient algorithm for computing the pre-image from the regression coefficients extends the applicability of kernel dependency estimation to output sequences. We report the results of a series of experiments illustrating the application of our regression technique for learning transductions.

Jason Weston | Mehryar Mohri | Corinna Cortes

[1] Jean Berstel,et al. Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[2] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[3] Aryeh Kontorovich. Uniquely decodable n-gram embeddings , 2004, Theor. Comput. Sci..

[4] Alexander J. Smola,et al. Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[5] Pascal Vincent,et al. Kernel Matching Pursuit , 2002, Machine Learning.

[6] Bernhard Schölkopf,et al. Kernel Dependency Estimation , 2002, NIPS.

[7] Alexander J. Smola,et al. Learning with kernels , 1998 .

[8] Bernhard Schölkopf,et al. Joint Kernel Maps , 2005, IWANN.

[9] Michael I. Jordan,et al. Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10] Ben Taskar,et al. Exponentiated Gradient Algorithms for Large-margin Structured Classification , 2004, NIPS.

[11] Sean R. Eddy,et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[12] Arto Salomaa,et al. Automata-Theoretic Aspects of Formal Power Series , 1978, Texts and Monographs in Computer Science.

[13] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[14] Mehryar Mohri,et al. Rational Kernels: Theory and Algorithms , 2004, J. Mach. Learn. Res..

[15] Arto Salomaa,et al. Semirings, Automata, Languages , 1985, EATCS Monographs on Theoretical Computer Science.

[16] Alexander Gammerman,et al. Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[17] Robin J. Wilson. Introduction to Graph Theory , 1974 .

[18] Arto Salomaa,et al. Semirings, Automata and Languages , 1985 .

[19] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[20] Michael Collins,et al. Convolution Kernels for Natural Language , 2001, NIPS.