A sample selection algorithm in fuzzy decision tree induction and its theoretical analyses

The generalization capability of a classifier will probably be degenerated when the classifier is generated from a dataset containing redundancy. To remove the redundancy, sample selection methods which choose the most valuable and representative instances from the original date set, can be used to obtain a subset of the original dataset. It is expected that the classifier trained from the subset can achieve no lower generalization capability than the classifier trained from the original data set. This paper proposes a sample selection method based on maximum entropy of testing instances in the fuzzy decision tree induction, and also gives the related theoretical analyses.