Identification with Probability One of Stochastic Deterministic Linear Languages

Learning context-free grammars is generally considered a very hard task. This is even more the case when learning has to be done from positive examples only. In this context one possibility is to learn stochastic context-free grammars, by making the implicit assumption that the distribution of the examples is given by such an object. Nevertheless this is still a hard task for which no algorithm is known. We use recent results to introduce a proper subclass of linear grammars, called deterministic linear grammars, for which we prove that a small canonical form can be found. This has been a successful condition for a learning algorithm to be possible. We propose an algorithm for this class of grammars and we prove that our algorithm works in polynomial time, and structurally converges to the target in the paradigm of identification in the limit with probability 1. Although this does not ensure that only a polynomial size sample is necessary for learning to be possible, we argue that the criterion means that no added (hidden) bias is present.

[1]  Enric Plaza,et al.  Machine Learning: ECML 2000 , 2003, Lecture Notes in Computer Science.

[2]  Colin de la Higuera,et al.  Inferring Deterministic Linear Languages , 2002, COLT.

[3]  Pat Langley,et al.  Learning Context-Free Grammars with a Simplicity Bias , 2000, ECML.

[4]  Rafael C. Carrasco,et al.  Grammatical Inference and Applications , 1994, Lecture Notes in Computer Science.

[5]  Alex Acero,et al.  Evaluation of spoken language grammar learning in the ATIS domain , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[7]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[8]  Francisco Casacuberta,et al.  Submission to ICGI-2000 Computational complexity of problems on probabilistic grammars and transducers , 2007 .

[9]  José Oncina,et al.  Learning deterministic regular grammars from stochastic samples in polynomial time , 1999, RAIRO Theor. Informatics Appl..

[10]  Yasubumi Sakakibara,et al.  Efficient Learning of Context-Free Grammars from Positive Structural Examples , 1992, Inf. Comput..

[11]  J. Baker Trainable grammars for speech recognition , 1979 .

[12]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[13]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[14]  Matthew Young-Lai,et al.  Stochastic Grammatical Inference of Text Database Structure , 2000, Machine Learning.