Relevant Representations for the Inference of Rational Stochastic Tree Languages

Recently, an algorithm - D EES - was proposed for learning rational stochastic tree languages. Given a sample of trees independently and identically drawn according to a distribution defined by a rational stochastic language, D EES outputs a linear representation of a rational series which converges to the target. D EES can then be used to identify in the limit with probability one rational stochastic tree languages. However, when D EES deals with finite samples, it often outputs a rational tree series which does not define a stochastic language. Moreover, the linear representation can not be directly used as a generative model. In this paper, we show that any representation of a rational stochastic tree language can be transformed in a reduced normalised representation that can be used to generate trees from the underlying distribution. We also study some properties of consistency for rational stochastic tree languages and discuss their implication for the inference. We finally consider the applicability of D EES to trees built over an unranked alphabet.