A similarity between probabilistic tree languages: application to XML document families

We describe a general approach to compute a similarity measure between distributions generated by probabilistic tree automata that may be used in a number of applications in the pattern recognition field. In particular, we show how this similarity can be computed for families of structured (XML) documents can be computed. In such case, the use of regular expressions to specify the right part of the expansion rules adds some complexity to the task.