A Novel Term Weighting Scheme Model

The use of textual data has increased exponentially in recent years due to the networking infrastructure such as Facebook, Twitter, Wikipedia, Blogs, and so one. Analysis of this massive textual data can help to automatically categorize and label new content. Before classification process, term weighting scheme is the crucial step for representing the documents in a way suitable for classification algorithms. In this paper, we are conducting a survey on the term weighting schemes and we propose an efficient term weighting scheme that provide a better classification accuracy than those obtained with the famous TF-IDF, the recent IF-IGM and the others term weighting schemes in the literature.

[1]  Ji Geng,et al.  An Improved Text Categorization Algorithm Based on VSM , 2014, 2014 IEEE 17th International Conference on Computational Science and Engineering.

[2]  Jian Su,et al.  Supervised and Traditional Term Weighting Methods for Automatic Text Categorization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  D. S. Guru,et al.  A Novel Term_Class Relevance Measure for Text Categorization , 2016, ArXiv.

[4]  Aïcha Mokhtari,et al.  Combining supervised term-weighting metrics for SVM text classification with extended term representation , 2016, Knowledge and Information Systems.

[5]  Hao Zhang,et al.  Turning from TF-IDF to TF-IGM for term weighting in text classification , 2016, Expert Syst. Appl..

[6]  Tatsunori Mori,et al.  Term Weighting Method based on Information Gain Ratio for Summarizing Documents Retrieved by IR Systems , 2001, NTCIR.

[7]  Xiaodong Gu,et al.  Balancing between over-weighting and under-weighting in supervised term weighting , 2016, Inf. Process. Manag..

[8]  Gordon V. Cormack,et al.  Spam filtering for short messages , 2007, CIKM '07.

[9]  Han Wang,et al.  A Term Frequency Based Weighting Scheme Using Naïve Bayes for Text Classification , 2016 .

[10]  Fabrizio Sebastiani,et al.  Supervised term weighting for automated text categorization , 2003, SAC '03.

[11]  Jing Wang,et al.  A term weighting scheme based on the measure of relevance and distinction for text categorization , 2015, 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[12]  Maosong Sun,et al.  Chinese Text Categorization Based on the Binary Weighting Model with Non-binary Smoothing , 2003, ECIR.

[13]  Shiwei Tang,et al.  A Comparative Study on Feature Weight in Text Categorization , 2004, APWeb.

[14]  Masoud Rahgozar,et al.  A query term re-weighting approach using document similarity , 2016, Inf. Process. Manag..

[15]  Tao Wang,et al.  Entropy-Based Term Weighting Schemes for Text Categorization in VSM , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).