Clustering by Tree Distance for Parse Tree Normalisation

The application of tree-distance to clustering is considered. Previous work identified some parameters which favourably affect the use of tree-distance in question-answering tasks. Some evidence is given that the same parameters favourably affect the cluster quality. A potential application is in the creation of systems to carry out transformation of interrogative to indicative sentences, a first step in a question-answering system. It is argued that the clustering provides a means to navigate the space of parses assigned to large question sets. A treedistance analogue of vector-space notion of centroid is proposed, which derives from a cluster a kind of pattern tree summarising the cluster.