A method is given of choosing k-way partitions (where 2 ≤ k ≤ (number of categories of predictor variable)) in classification or decision tree analyses. The method, like that proposed by Kass, chooses the best partition on the basis of statistical significanceand uses the Bonferroni inequality to calculate the significance. Unlike Kass's algorithm, the algorithm does not favour simple partitions (low values of k) nor does it discriminate against free-type (no restriction on order of values) predictor variables with many categories. A method of adjusting the significance for the number of predictor variables and of using multiple comparisons to put an upper bound on the significance is given. Monte Carlo tests show that the algorithm gives slightly conservative tests of significance for both small and large samples and does not favour one type of predictor variable over another. The algorithm is incoroporated in a PC software program, Knowledgeseeker,which is briefly described.
[1]
Rupert G. Miller.
Simultaneous Statistical Inference
,
1966
.
[2]
H. J. Einhorn.
ALCHEMY IN THE BEHAVIORAL SCIENCES
,
1972
.
[3]
G. V. Kass.
Significance Testing in Automatic Interaction Detection (A.I.D.)
,
1975
.
[4]
Peter Doyle,et al.
The Pitfalls of AID Analysis
,
1975
.
[5]
G. V. Kass.
An Exploratory Technique for Investigating Large Quantities of Categorical Data
,
1980
.
[6]
G. V. Kass,et al.
AUTOMATIC INTERACTION DETECTION
,
1982
.
[7]
J. Ross Quinlan,et al.
Simplifying Decision Trees
,
1987,
Int. J. Man Mach. Stud..
[8]
W. Loh,et al.
Tree-Structured Classification via Generalized Discriminant Analysis.
,
1988
.
[9]
Ian M. Carter.
Application of expert systems: J Ross Quinlan Addison-Wesley, 1987, Hardback, 223 pp £19.95, ISBN: 0 201 17449 9
,
1989
.