Text Categorization Using a Novel Feature Selection Technique Combined with ELM

The rapid growth of digital documents and internet users on the web increases the searching time of a document for the end user, which affects the performance of the search engine badly. Hence, to reduce the searching time and to increase the efficiency of the search engine, text classification is the need of the day. But to do an efficient text classification, selection of good features also equally important. To address this issue, the current paper proposes an approach called Combined Correlation Discriminative Power Measure (CCDPM) where first the highly correlated terms (features) are removed from the corpus and then using the scores generated by discriminative power measure technique, the uncorrelated features of the corpus are ranked. Top k features are selected to generate the reduced training feature vector. For classification, Extreme Learning Machine (ELM) is used, and the empirical results on four benchmark datasets show the efficiency the proposed approach compared to other state-of-the-art feature selection techniques. Results of ELM are more promising compared to other conventional classifiers.

[1]  Xuanjing Huang,et al.  Hierarchical Text Classification with Latent Concepts , 2011, ACL.

[2]  Gadadhar Sahoo,et al.  Feature selection in accident data: an analysis of its application in classification algorithms , 2016, Int. J. Data Anal. Tech. Strateg..

[3]  Wenqian Shang,et al.  A novel feature selection algorithm for text categorization , 2007, Expert Syst. Appl..

[4]  Ron Kohavi,et al.  Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[5]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[6]  Dae-Won Kim,et al.  Mutual Information-based multi-label feature selection using interaction information , 2015, Expert Syst. Appl..

[7]  K. L. Shunmuganathan,et al.  Feature selection based on genetic algorithm and hybrid model for sentiment polarity classification , 2016, Int. J. Data Min. Model. Manag..

[8]  Abhishek Srivastava,et al.  Commonality-Rarity Score Computation: A novel Feature Selection Technique using Extended Feature Space of ELM for Text Classification , 2016, FIRE.

[9]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[10]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[11]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[12]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[14]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[15]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[16]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[17]  Rajendra Kumar Roul,et al.  Extreme learning machines in the field of text classification , 2015, 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[18]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[19]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[20]  Yanxi Liu,et al.  Online Selection of Discriminative Tracking Features , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Rajendra Kumar Roul,et al.  Study on suitability and importance of multilayer extreme learning machine for classification of text data , 2016, Soft Computing.

[22]  Rajendra Kumar Roul,et al.  K-means and Wordnet Based Feature Selection Combined with Extreme Learning Machines for Text Classification , 2016, ICDCIT.

[23]  Guang-Bin Huang,et al.  Convex incremental extreme learning machine , 2007, Neurocomputing.