Analysis of nursing-care freestyle japanese text classification using ga-based term selection

In this paper, classification performance of a term selection based on GA is analyzed. In the term selection based on GA, two objectives which are maximizing correctly classified texts and minimizing selected terms are optimized. An objective function based on the classification per-formance of the SVM with 10-fold cross validation is used for evaluating each individual in GA. Therefore, GA-based term selection is performed aiming at the improvement in classification per-formance on testing text sets. This causes the performance deterioration over unseen texts in actual use by GA-based term selection because terms are deleted excessively even when such terms have important role for the classification. In this paper, relation between the terms deleted by the term se-lection based on GA and the terms which appears in unseen texts is clarified by numerical simulation results.