Unsupervised classification via decision trees: an information-theoretic perspective

Integrated sensing and processing decision trees (ISPDT) (Priebe et al. (2004)) were introduced as a tool for supervised classification of high-dimensional data. In this paper, we consider the problem of unsupervised classification, through a recursive construction of ISPDT, where at each internal node the data (i) are split into clusters, and (ii) are transformed independently of other clusters, guided by some optimization objective. We show that the maximization of information-theoretic quantities such as mutual information and /spl alpha/-divergences is theoretically justified for growing ISPDT, assuming that each data point is generated by a finite-memory random process given the class label. Furthermore, we present heuristics that perform the maximization in a greedy manner, and we demonstrate their effectiveness with empirical results from multispectral imaging.