Feature unionization: A novel approach for dimension reduction

Graphical abstractDisplay Omitted HighlightsWe proposed a novel dimension reduction approach, called feature unionization.The proposed approach is able to construct a compact and discriminative feature subset that is effective to increase the performance of classification.Several statistical tests show the robustness and effectiveness of the proposals.The proposed approach is suitable for high dimensional datasets. Dimension reduction is an effective way to improve the classification performance in machine learning. Reducing the irrelevant features decreases the training time and may increase the classification accuracy. Although feature selection as a dimension reduction method can select a reduced feature subset, the size of the subset can be more reduced and its discriminative power can be more improved. In this paper, a novel approach, called feature unionization, is proposed for dimension reduction in classification. Using union operator, this approach combines several features to construct a more informative single feature. To verify the effectiveness of the feature unionization, several experiments were carried out on fourteen publicly available datasets in sentiment classification domain using three typical classifiers. The experimental results showed that the proposed approach worked efficiently and outperformed the feature selection approach.

[1]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Yung-Chun Chang,et al.  A semantic frame-based intelligent agent for topic detection , 2017, Soft Comput..

[4]  Ngoc Thanh Nguyen,et al.  A combined negative selection algorithm-particle swarm optimization for an email spam detection system , 2015, Eng. Appl. Artif. Intell..

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[7]  Enrique Alba,et al.  Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments , 2016, Appl. Soft Comput..

[8]  Usman Qamar,et al.  SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection , 2016, Appl. Soft Comput..

[9]  Dana Kulic,et al.  An evaluation of classifier-specific filter measure performance for feature selection , 2015, Pattern Recognit..

[10]  SahinFerat,et al.  A survey on feature selection methods , 2014 .

[11]  Tommy W. S. Chow,et al.  Content-based hierarchical document organization using multi-layer hybrid network and tree-structured features , 2010, Expert Syst. Appl..

[12]  Larry S. Yaeger,et al.  Building a General Purpose Cross-Domain Sentiment Mining Model , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  Santanu Kumar Rath,et al.  Classification of sentiment reviews using n-gram machine learning approach , 2016, Expert Syst. Appl..

[15]  Wei-Chang Yeh,et al.  Web page classification based on a simplified swarm optimization , 2015, Appl. Math. Comput..

[16]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[17]  Hsinchun Chen,et al.  Selecting Attributes for Sentiment Classification Using Feature Relation Networks , 2011, IEEE Transactions on Knowledge and Data Engineering.

[18]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[19]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[20]  Razieh Sheikhpour,et al.  Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer , 2016, Appl. Soft Comput..

[21]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[22]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[23]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[25]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[26]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[27]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[28]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[29]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[30]  Xindong Wu,et al.  News Filtering and Summarization on the Web , 2010, IEEE Intelligent Systems.