Hierarchical Mixtures of Experts and the EM Algorithm

In the statistical literature and in the machine learning literature, divide-and-conquer algorithms have become increasingly popular. The CART algorithm (Breiman, et al., 1984) and the MARS algorithm (Friedman, 1991) are well-known examples. These algorithms fit surfaces to data by explicitly dividing the input space into a nested sequence of regions, and by fitting simple surfaces (e.g., constant functions) within these regions. The advantages of these algorithms include the interpretability of their solutions and the speed of the training process.