Probabilistic k-Testable Tree Languages

In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is that the model can be updated in an incremental fashion. This method is an alternative to costly learning algorithms (as inside-outside-based methods) or algorithms that require larger samples (as many state merging/splitting methods).

[1]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[2]  Hermann Ney,et al.  On the Estimation of 'Small' Probabilities by Leaving-One-Out , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Yasubumi Sakakibara,et al.  Efficient Learning of Context-Free Grammars from Positive Structural Examples , 1992, Inf. Comput..

[4]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[5]  Frank Rubin,et al.  Experiments in text file compression , 1976, CACM.

[6]  A. N. V. Rao,et al.  Approximating grammar probabilities: solution of a conjecture , 1986, JACM.

[7]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[8]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[9]  C. S. Wetherell,et al.  Probabilistic Languages: A Review and Some Open Questions , 1980, CSUR.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[12]  Kai Lai Chung,et al.  Markov Chains with Stationary Transition Probabilities , 1961 .

[13]  Andreas Stolcke,et al.  Precise N-Gram Probabilities From Stochastic Context-Free Grammars , 1994, ACL.

[14]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[15]  Jorge Calera-Rubio,et al.  Computing the Relative Entropy Between Regular Tree Languages , 1998, Inf. Process. Lett..

[16]  Enrique Vidal,et al.  Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Timo Knuutila Inference of k -testable Tree Languages , 1993 .