On entropy-based term weighting schemes for text categorization

[1]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[2]  Bin Cao,et al.  Short text classification by detecting information path , 2013, CIKM.

[3]  Ruixuan Li,et al.  Self-inhibition Residual Convolutional Networks for Chinese Sentence Classification , 2018, ICONIP.

[4]  Orestis Papakyriakopoulos,et al.  Bias in word embeddings , 2020, FAT*.

[5]  Youngjoong Ko,et al.  A study of term weighting schemes using class information for text classification , 2012, SIGIR '12.

[6]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[7]  Milos Hauskrecht,et al.  Boosting KNN text classification accuracy by using supervised term weighting schemes , 2009, CIKM.

[8]  Diomidis Spinellis,et al.  Word Embeddings for the Software Engineering Domain , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[9]  Xiaodong Gu,et al.  Balancing between over-weighting and under-weighting in supervised term weighting , 2016, Inf. Process. Manag..

[10]  Christophe Moulin,et al.  Entropy based feature selection for text categorization , 2011, SAC.

[11]  Jiajia Luo,et al.  Exploiting Syntactic and Semantic Information for Textual Similarity Estimation , 2021 .

[12]  Aïcha Mokhtari,et al.  Combining supervised term-weighting metrics for SVM text classification with extended term representation , 2016, Knowledge and Information Systems.

[13]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[14]  James Bailey,et al.  Effective global approaches for mutual information based feature selection , 2014, KDD.

[15]  M. Warrens On Association Coefficients for 2×2 Tables and Properties That Do Not Depend on the Marginal Distributions , 2008, Psychometrika.

[16]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[17]  Shiwei Tang,et al.  A Comparative Study on Feature Weight in Text Categorization , 2004, APWeb.

[18]  Xipeng Qiu,et al.  Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[19]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[20]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[21]  Gerard Salton,et al.  A comparison of search term weighting: term relevance vs. inverse document frequency , 1981, SIGIR 1981.

[22]  Fabrizio Sebastiani,et al.  Supervised term weighting for automated text categorization , 2003, SAC '03.

[23]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[24]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[25]  Huanhuan Yuan,et al.  Sentiment Analysis Based on Weighted Word2vec and Att-LSTM , 2018, CSAI '18.

[26]  Adam Tauman Kalai,et al.  What are the Biases in My Word Embedding? , 2018, AIES.

[27]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[28]  David D. Lewis,et al.  Evaluating Text Categorization I , 1991, HLT.

[29]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[30]  Wenyin Liu,et al.  Term Weighting Schemes for Question Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[32]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[33]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[34]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[36]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[37]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[38]  Youngjoong Ko,et al.  A new term‐weighting scheme for text classification using the odds of positive and negative class probabilities , 2015, J. Assoc. Inf. Sci. Technol..

[39]  Hui Xiong,et al.  A semantic term weighting scheme for text categorization , 2011, Expert Syst. Appl..

[40]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[41]  Boqin Feng,et al.  An Extended Supervised Term Weighting Method for Text Categorization , 2011 .

[42]  Hao Zhang,et al.  Turning from TF-IDF to TF-IGM for term weighting in text classification , 2016, Expert Syst. Appl..

[43]  Jörg Kindermann,et al.  Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? , 2002, Machine Learning.

[44]  Nazlia Omar,et al.  Question classification based on Bloom’s taxonomy cognitive domain using modified TF-IDF and word2vec , 2020, PloS one.

[45]  Stanley F. Chen,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[46]  Jiaul H. Paik A novel TF-IDF weighting scheme for effective ranking , 2013, SIGIR.