Combination of Multiple Feature Selection Methods for Text Categorization by using Combinatorial Fusion Analysis and Rank-Score Characteristic

Effective feature selection methods are important for improving the efficiency and accuracy of text categorization algorithms by removing redundant and irrelevant terms from the corpus. Extensive research has been done to improve the performance of individual feature selection methods. However, it is always a challenge to come up with an individual feature selection method which would outperform other methods in most cases. In this paper, we explore the possibility of improving the overall performance by combining multiple individual feature selection methods. In particular, we propose a method of combining multiple feature selection methods by using an information fusion paradigm, called Combinatorial Fusion Analysis (CFA). A rank-score function and its associated graph, called rank-score graph, are adopted to measure the diversity of different feature selection methods. Our experimental results demonstrated that a combination of multiple feature selection methods can outperform a single method only if each individual feature selection method has unique scoring behavior and relatively high performance. Moreover, it is shown that the rank-score function and rank-score graph are useful for the selection of a combination of feature selection methods.

[1]  Douglas W. Oard,et al.  Combining feature selectors for text classification , 2006, CIKM '06.

[2]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[3]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[4]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[5]  Soon Myoung Chung,et al.  Combining Multiple Feature Selection Methods for Text Categorization by Using Rank-Score Characteristics , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[6]  D. Frank Hsu,et al.  Consensus Scoring Criteria for Improving Enrichment in Virtual Screening , 2005, J. Chem. Inf. Model..

[7]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[8]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD 2000.

[9]  D. Frank Hsu,et al.  A Study of Data Fusion in Cayley Graphs G(S{n}, P{n}). , 2004 .

[10]  D. Frank Hsu,et al.  Rank-Score Characteristics (RSC) Function and Cognitive Diversity , 2010, Brain Informatics.

[11]  Robert Neumayer,et al.  Combination of Feature Selection Methods for Text Categorisation , 2011, ECIR.

[12]  Damian M. Lyons,et al.  Combining multiple scoring systems for target tracking using rank-score characteristics , 2009, Inf. Fusion.

[13]  Soon Myoung Chung,et al.  Text document clustering based on frequent word meaning sequences , 2008, Data Knowl. Eng..

[14]  Soon Myoung Chung,et al.  Text Clustering with Feature Selection by Using Statistical Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[15]  Chong-Ho Choi,et al.  Improved mutual information feature selector for neural networks in supervised learning , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[17]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[18]  Yiming Yang,et al.  Noise reduction in a statistical approach to text categorization , 1995, SIGIR '95.

[19]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[20]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[21]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[22]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[23]  Frederick Mosteller,et al.  Association and Estimation in Contingency Tables , 1968 .

[24]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[25]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[26]  W. John Wilbur,et al.  The automatic identification of stop words , 1992, J. Inf. Sci..

[27]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[28]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[30]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[31]  Chuan Yi Tang,et al.  Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction , 2007, IEEE Transactions on NanoBioscience.