A Novel Text Classification Method Using Comprehensive Feature Weight

Currently, since the categorical distribution of short text corpus is not balanced, it is difficult to obtain accurate classification results for long text classification. To solve this problem, this paper proposes a novel method of short text classification using comprehensive feature weights. This method takes into account the situation of the samples in the positive and negative categories, as well as the category correlation of words, so as to improve the existing feature weight calculation method and obtain a new method of calculating the comprehensive feature weight. The experimental result shows that the proposed method is significantly higher than other feature-weight methods in the micro and macro average value, which shows that this method can greatly improve the accuracy and recall rate of short text classification.

[1]  David I. Holmes,et al.  Feature-Finding for Text Classification , 1996 .

[2]  Duan Liguo,et al.  A New Naive Bayes Text Classification Algorithm , 2014 .

[3]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[4]  Peng Wang,et al.  Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification , 2016, Neurocomputing.

[5]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[6]  Zhoujun Li,et al.  Concept-based Short Text Classification and Ranking , 2014, CIKM.

[7]  Aixin Sun,et al.  Short text classification using very few words , 2012, SIGIR '12.

[8]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[9]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[10]  Michael W. Berry,et al.  Survey of Text Mining , 2003, Springer New York.

[11]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[12]  Zhao Ming Study on Feature Selection in Chinese Text Categorization , 2004 .

[13]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[14]  Mengen Chen,et al.  Short Text Classification Improved by Learning Multi-Granularity Topics , 2011, IJCAI.

[15]  Jun Li,et al.  Social emotion classification of short text via topic-level maximum entropy model , 2016, Inf. Manag..

[16]  Victoria Bobicev,et al.  An Effective and Robust Method for Short Text Classification , 2008, AAAI.

[17]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[18]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[19]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[20]  Pei-Ying Zhang A HowNet-Based Semantic Relatedness Kernel for Text Classification , 2013 .