A method of choosing multiway partitions for classification and decision trees

A method is given of choosing k-way partitions (where 2 ≤ k ≤ (number of categories of predictor variable)) in classification or decision tree analyses. The method, like that proposed by Kass, chooses the best partition on the basis of statistical significanceand uses the Bonferroni inequality to calculate the significance. Unlike Kass's algorithm, the algorithm does not favour simple partitions (low values of k) nor does it discriminate against free-type (no restriction on order of values) predictor variables with many categories. A method of adjusting the significance for the number of predictor variables and of using multiple comparisons to put an upper bound on the significance is given. Monte Carlo tests show that the algorithm gives slightly conservative tests of significance for both small and large samples and does not favour one type of predictor variable over another. The algorithm is incoroporated in a PC software program, Knowledgeseeker,which is briefly described.