Scaling feature selection method for enhancing the classification performance of Support Vector Machines in text mining

Abstract The classification of opinion based on customer reviews is a complex process owing to high dimensionality. In this study, our objective is to select the minimum number of features to effectively classify reviews. The tf-idf and Glasgow methods are commonly for feature selection in opinion mining. We propose two modifications to the traditional tf-idf and Glasgow expressions using graphical representations to reduce the size of the feature set. The accuracy of the proposed expressions is established through the support vector machine technique. In addition, a new framework is devised to measure the effectiveness of the term weighting expressions adopted for feature selection. Finally, the strength of the expressions is established through evaluation criteria and effectiveness, and this strength is tested statistically. Based on our experimental results, our modified tf-idf and Glasgow methods performed better than the traditional term weighting expressions for the extraction of the minimum number of prominent features required for classification, thus enhancing the performance of the Support Vector Machine.

[1]  Theodore T. Allen,et al.  Timely Decision Analysis Enabled by Efficient Social Media Modeling , 2017, Decis. Anal..

[2]  Feng Wu,et al.  A discriminative and semantic feature selection method for text categorization , 2015 .

[3]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[4]  Sohyung Cho,et al.  Web-based algorithm for cylindricity evaluation using support vector machine learning , 2011, Comput. Ind. Eng..

[5]  Inseok Song,et al.  Identifying product opportunities using collaborative filtering-based patent analysis , 2017, Comput. Ind. Eng..

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[8]  Ming Li,et al.  An approach of product usability evaluation based on Web mining in feature fatigue analysis , 2014, Comput. Ind. Eng..

[9]  Sang M. Lee,et al.  Text classification: neural networks vs support vector machines , 2009, Ind. Manag. Data Syst..

[10]  Pramod Kumar Singh,et al.  A Two-Stage Unsupervised Dimension Reduction Method for Text Clustering , 2012, BIC-TA.

[11]  Theodore T. Allen,et al.  A directed topic model applied to call center improvement , 2016 .

[12]  Japinder Singh,et al.  Feature-based opinion mining and ranking , 2012, J. Comput. Syst. Sci..

[13]  Kathirvalavakumar Thangairulappan,et al.  Improved Term Weighting Technique for Automatic Web Page Classification , 2016 .

[14]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[15]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .

[16]  Luis Alfonso Ureña López,et al.  Experiments with SVM to classify opinions in different domains , 2011, Expert Syst. Appl..

[17]  Yu Fang A Feature Selection Method for NB-based Classifier , 2004 .

[18]  Kichun Lee,et al.  Opinion mining using ensemble text hidden Markov models for text classification , 2018, Expert Syst. Appl..

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[20]  Kesari Verma,et al.  Variable Global Feature Selection Scheme for automatic classification of text documents , 2017, Expert systems with applications.

[21]  Ali Selamat,et al.  Hybridized term-weighting method for web contents classification using SVM , 2015 .

[22]  Angappa Gunasekaran,et al.  An integrated decision analytic framework of machine learning with multi-criteria decision making for multi-attribute inventory classification , 2016, Comput. Ind. Eng..

[23]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[24]  Pramod Kumar Singh,et al.  Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering , 2015, Expert Syst. Appl..

[25]  Petros Xanthopoulos,et al.  A weighted support vector machine method for control chart pattern recognition , 2014, Comput. Ind. Eng..

[26]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[27]  Han Tong Loh,et al.  Gather customer concerns from online product reviews - A text summarization approach , 2009, Expert Syst. Appl..

[28]  Roliana Ibrahim,et al.  Ordinal-based and frequency-based integration of feature selection methods for sentiment analysis , 2017, Expert Syst. Appl..

[29]  Fabrice Guillet,et al.  Visual analytics for exploring topic long-term evolution and detecting weak signals in company targeted tweets , 2017, Comput. Ind. Eng..

[30]  Yang Jinsheng Way of text classification based on Bayes , 2006 .

[31]  Azuraliza Abu Bakar,et al.  Hybrid feature selection based on enhanced genetic algorithm for text categorization , 2016, Expert Syst. Appl..

[32]  Ondrej Krejcar,et al.  Modified frequency-based term weighting schemes for text classification , 2017, Appl. Soft Comput..

[33]  Guang Yang,et al.  Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification , 2016, J. Inf. Sci. Eng..

[34]  Kai Zhang,et al.  Research of Feature Selection for Text Clustering Based on Cloud Model , 2013, J. Softw..

[35]  Han Tong Loh,et al.  Imbalanced text classification: A term weighting approach , 2009, Expert Syst. Appl..

[36]  Amy J. C. Trappey,et al.  Ontology-based reasoning for the intelligent handling of customer complaints , 2015, Comput. Ind. Eng..

[37]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.