论文信息 - Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification

Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification

This paper presents an extended Relief-F algorithm for nominal attribute estimation, for application to small-document classification. Relief algorithms are general and successful instance-based feature-filtering algorithms for data classification and regression. Many improved Relief algorithms have been introduced as solutions to problems of redundancy and irrelevant noisy features and to the limitations of the algorithms for multiclass datasets. However, these algorithms have only rarely been applied to text classification, because the numerous features in multiclass datasets lead to great time complexity. Therefore, in considering their application to text feature filtering and classification, we presented an extended Relief-F algorithm for numerical attribute estimation (E-Relief-F) in 2007. However, we found limitations and some problems with it. Therefore, in this paper, we introduce additional problems with Relief algorithms for text feature filtering, including the negative influence of computation similarities and weights caused by a small number of features in an instance, the absence of nearest hits and misses for some instances, and great time complexity. We then suggest a new extended Relief-F algorithm for nominal attribute estimation (E-Relief-Fd) to solve these problems, and we apply it to small text-document classification. We used the algorithm in experiments to estimate feature quality for various datasets, its application to classification, and its performance in comparison with existing Relief algorithms. The experimental results show that the new E-Relief-Fd algorithm offers better performance than previous Relief algorithms, including E-Relief-F.

Hyuk-Chul Kwon | Heum Park

[1] B. Raman,et al. Instance Based Filter for Feature Selection , 2002 .

[2] Igor Kononenko,et al. ReliefF for estimation and discretization of attributes in classification, regression, and ILP probl , 1996 .

[3] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[4] Jian Li,et al. Iterative RELIEF for feature weighting , 2006, ICML.

[5] Antonio Arauzo-Azofra,et al. A feature set measure based on Relief , 2004 .

[6] Marko Robnik-Sikonja,et al. An adaptation of Relief for attribute estimation in regression , 1997, ICML.

[7] Huan Liu,et al. Feature Selection for Classification , 1997, Intell. Data Anal..

[8] Larry A. Rendell,et al. A Practical Approach to Feature Selection , 1992, ML.

[9] Igor Kononenko,et al. Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[10] Hyuk-Chul Kwon,et al. Extended Relief Algorithms in Instance-Based Feature Filtering , 2007, Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007).

[11] Marko Robnik-Sikonja,et al. Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.