Unsupervised spectral learning of FSTs

Finite-State Transducers (FST) are a standard tool for modeling paired input-output sequences and are used in numerous applications, ranging from computational biology to natural language processing. Recently Balle et al. [4] presented a spectral algorithm for learning FST from samples of aligned input-output sequences. In this paper we address the more realistic, yet challenging setting where the alignments are unknown to the learning algorithm. We frame FST learning as finding a low rank Hankel matrix satisfying constraints derived from observable statistics. Under this formulation, we provide identifiability results for FST distributions. Then, following previous work on rank minimization, we propose a regularized convex relaxation of this objective which is based on minimizing a nuclear norm penalty subject to linear constraints and can be solved efficiently.

[1]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[2]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[3]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[4]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[5]  Alexander Clark Partially Supervised Learning of Morphology with Stochastic Transducers , 2001, NLPRS.

[6]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[7]  Mehryar Mohri,et al.  Spectral Learning of General Weighted Automata via Constrained Matrix Completion , 2012, NIPS.

[8]  Gilles Blanchard,et al.  On the Convergence of Eigenspaces in Kernel Principal Component Analysis , 2005, NIPS.

[9]  Jason Eisner,et al.  Parameter Estimation for Probabilistic Finite-State Transducers , 2002, ACL.

[10]  Ariadna Quattoni,et al.  Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion , 2013, EMNLP.

[11]  Le Song,et al.  A Spectral Algorithm for Latent Tree Graphical Models , 2011, ICML.

[12]  Ariadna Quattoni,et al.  A Spectral Learning Algorithm for Finite State Transducers , 2011, ECML/PKDD.

[13]  Ariadna Quattoni,et al.  Local Loss Optimization in Operator Models: A New Insight into Spectral Learning , 2012, ICML.

[14]  Marc Sebban,et al.  A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer , 2006, ICGI.

[15]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[16]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2011, Int. J. Robotics Res..

[17]  Francisco Casacuberta Inference of Finite-State Transducers by Using Regular Grammars and Morphisms , 2000, ICGI.

[18]  Liva Ralaivola,et al.  Grammatical inference as a principal component analysis problem , 2009, ICML '09.

[19]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.