Automatic Segmentation by Decision Trees

We present a system for automatic segmentation by decision trees, able to cope with large data sets, with special attention to stability problems. Tree-based methods are a statistical operation for automatic learning from data, its main characteristic is the simplicity of the obtained results. It uses a recursive algorithm which can be very costly for large data sets and it is very dependent on data, since small fluctuations on data may cause a big change in the tree-growing process. First our purpose has been to define data diagnostics to prevent internal instability in the tree growingprocess before a particular split has been made. Then we study the complexity of the algorithm and its applicability to big data sets.