Learning Rational Stochastic Tree Languages

We consider the problem of learning stochastic tree languages, i.e. probability distributions over a set of trees $T({\cal F})$, from a sample of trees independently drawn according to an unknown target P. We consider the case where the target is a rational stochastic tree language, i.e. it can be computed by a rational tree series or, equivalently, by a multiplicity tree automaton. In this paper, we provide two contributions. First, we show that rational tree series admit a canonical representation with parameters that can be efficiently estimated from samples. Then, we give an inference algorithm that identifies the class of rational stochastic tree languages in the limit with probability one.

[1]  José Oncina,et al.  Learning Multiplicity Tree Automata , 2006, ICGI.

[2]  Jean Berstel,et al.  Recognizable Formal Power Series on Trees , 1982, Theor. Comput. Sci..

[3]  Juan Ramón Rico-Juan,et al.  Probabilistic k-Testable Tree Languages , 2000, ICGI.

[4]  Zoltán Ésik,et al.  Formal Tree Series , 2002 .

[5]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[6]  François Denis,et al.  Learning Classes of Probabilistic Automata , 2004, COLT.

[7]  Amaury Habrard,et al.  A Polynomial Algorithm for the Inference of Context Free Languages , 2008, ICGI.

[8]  L. Györfi Principles of nonparametric learning , 2002 .

[9]  G. Hardy,et al.  An Introduction to the Theory of Numbers , 1938 .

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Amaury Habrard,et al.  Using Pseudo-stochastic Rational Languages in Probabilistic Grammatical Inference , 2006, ICGI.

[12]  François Denis,et al.  Rational stochastic languages , 2006, ArXiv.

[13]  Jorge Calera-Rubio,et al.  Stochastic Inference of Regular Tree Languages , 1998, ICGI.

[14]  C. S. Wetherell,et al.  Probabilistic Languages: A Review and Some Open Questions , 1980, CSUR.

[15]  Naoki Abe,et al.  Predicting Protein Secondary Structure Using Stochastic Tree Grammars , 1997, Machine Learning.

[16]  Gábor Lugosi,et al.  Pattern Classification and Learning Theory , 2002 .

[17]  Heiko Vogler,et al.  Learning Deterministically Recognizable Tree Series , 2007, J. Autom. Lang. Comb..

[18]  Amaury Habrard,et al.  Learning Rational Stochastic Languages , 2006, COLT.