Automatic Segmentation by Decision Trees
暂无分享,去创建一个
We present a system for automatic segmentation by decision trees, able to cope with large data sets, with special attention to stability problems. Tree-based methods are a statistical operation for automatic learning from data, its main characteristic is the simplicity of the obtained results. It uses a recursive algorithm which can be very costly for large data sets and it is very dependent on data, since small fluctuations on data may cause a big change in the tree-growing process. First our purpose has been to define data diagnostics to prevent internal instability in the tree growingprocess before a particular split has been made. Then we study the complexity of the algorithm and its applicability to big data sets.
[1] A. Ciampi. Generalized regression trees , 1991 .
[2] R. Clarke,et al. Theory and Applications of Correspondence Analysis , 1985 .
[3] G. V. Kass. An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .
[4] Roberta Siciliano,et al. A Two-Stage Predictive Splitting Algorithm in Binary Segmentation , 1992 .
[5] J. Nakache,et al. Méthode de discrimination basée sur la construction d'un arbre de décision binaire , 1988 .