A Deterministic Annealing Approach for Parsimonious Design of Piecewise Regression Models

A new learning algorithm is proposed for piecewise regression modeling. It employs the technique of deterministic annealing to design space partition regression functions. While the performance of traditional space partition regression functions such as CART and MARS is limited by a simple tree-structured partition and by a hierarchical approach for design, the deterministic annealing algorithm enables the joint optimization of a more powerful piecewise structure based on a Voronoi partition. The new method is demonstrated to achieve consistent performance improvements over regular CART as well as over its extension to allow arbitrary hyperplane boundaries. Comparison tests, on several benchmark data sets from the regression literature, are provided.

[1]  William N. Venables,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[2]  Jenq-Neng Hwang,et al.  Regression modeling in back-propagation and projection pursuit learning , 1994, IEEE Trans. Neural Networks.

[3]  Kenneth Rose,et al.  A global optimization technique for statistical classifier design , 1996, IEEE Trans. Signal Process..

[4]  V. Cherkassky,et al.  Self-organizing network for regression: efficient implementation and comparative evaluation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[5]  D. W. Scott,et al.  Variable Kernel Density Estimation , 1992 .

[6]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[7]  Kenneth Rose,et al.  A generalized VQ method for combined compression and estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[9]  J. Friedman Multivariate adaptive regression splines , 1990 .

[10]  Kenneth Rose,et al.  A mapping approach to rate-distortion computation and analysis , 1994, IEEE Trans. Inf. Theory.

[11]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[12]  Geoffrey C. Fox,et al.  Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.

[13]  Sholom M. Weiss,et al.  Optimizing the Predictive Value of Diagnostic Decision Rules , 1987, AAAI.

[14]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[15]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[16]  Kaizhong Zhang,et al.  A better tree-structured vector quantizer , 1991, [1991] Proceedings. Data Compression Conference.

[17]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[18]  R. Bellman,et al.  Curve Fitting by Segmented Straight Lines , 1969 .

[19]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[20]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[21]  Geoffrey C. Fox,et al.  Constrained Clustering as an Optimization Method , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Kenneth Rose,et al.  Mixture of experts regression modeling by deterministic annealing , 1997, IEEE Trans. Signal Process..

[23]  H. Akaike A new look at the statistical model identification , 1974 .

[24]  Thomas M. Cover,et al.  Estimation by the nearest neighbor rule , 1968, IEEE Trans. Inf. Theory.

[25]  Philip A. Chou,et al.  Optimal pruning with applications to tree-structured source coding and modeling , 1989, IEEE Trans. Inf. Theory.

[26]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[27]  Joachim M. Buhmann,et al.  Vector quantization with complexity costs , 1993, IEEE Trans. Inf. Theory.

[28]  G. C. McDonald,et al.  Instabilities of Regression Estimates Relating Air Pollution to Mortality , 1973 .

[29]  Teuvo Kohonen,et al.  An introduction to neural computing , 1988, Neural Networks.

[30]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[31]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Jieyu Zhao,et al.  Neural Network Optimization for Good Generalization Performance , 1994 .

[33]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[34]  Geoffrey E. Hinton,et al.  Using Pairs of Data-Points to Define Splits for Decision Trees , 1995, NIPS.

[35]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .