Output-based transfer learning in genetic programming for document classification

Abstract Transfer learning has been studied in document classification for transferring a model trained from a source domain ( SD ) to a relatively similar target domain ( TD ). In feature-based transfer learning techniques, there is an investigation on the features being transferred from SD to TD . This paper conducts an investigation on an output-based transfer learning system using Genetic Programming (GP) in document classification tasks, which automatically selects features to construct classifiers. The proposed GP system directly generates programs from a set of sparse features and only considers the output change of the evolved programs from SD to TD . A linear model is then used to combine existing GP programs from SD as features to TD . Also, new GP programs are mutated from the programs evolved in SD to improve the accuracy. Via directly utilising the evolved GP programs and their mutations, the feature extraction and estimation processes on TD are avoided. The results for the experiments demonstrates that the GP programs from SD can be effectively used for classifying documents in the relevant TD . The results also show that it is easy to train effective classifiers on TD when the GP programs are used as features. Furthermore, the proposed linear model, using multiple GP programs from SD as its inputs, outperforms single GP programs which are directly obtained from TD .

[1]  Mengjie Zhang,et al.  Genetic programming for multiple-feature construction on high-dimensional classification , 2019, Pattern Recognit..

[2]  Giuseppe De Pietro,et al.  Deep neural network for hierarchical extreme multi-label text classification , 2019, Appl. Soft Comput..

[3]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[4]  Yang Li,et al.  Learning document representation via topic-enhanced LSTM model , 2019, Knowl. Based Syst..

[5]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[6]  Hugo Jair Escalante,et al.  Term-weighting learning via genetic programming for text classification , 2014, Knowl. Based Syst..

[7]  Mamata Jenamani,et al.  Senti-N-Gram: An n-gram lexicon for sentiment analysis , 2018, Expert Syst. Appl..

[8]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  Laurens van der Maaten,et al.  Feature-Level Domain Adaptation , 2015, J. Mach. Learn. Res..

[10]  Simon M. Lucas,et al.  A Survey of Statistical Machine Learning Elements in Genetic Programming , 2019, IEEE Transactions on Evolutionary Computation.

[11]  Gang Kou,et al.  Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods , 2020, Appl. Soft Comput..

[12]  Usman Qamar,et al.  Enhanced cross-domain sentiment classification utilizing a multi-source transfer learning approach , 2019, Soft Comput..

[13]  Taghi M. Khoshgoftaar,et al.  A survey of transfer learning , 2016, Journal of Big Data.

[14]  Mengjie Zhang,et al.  Distribution-based invariant feature construction using genetic programming for edge detection , 2014, Soft Computing.

[15]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[16]  Qasem A. Al-Radaideh,et al.  Integrating associative rule-based classification with Naïve Bayes for text classification , 2018, Appl. Soft Comput..

[17]  Murat Can Ganiz,et al.  Semantic text classification: A survey of past and recent advances , 2018, Inf. Process. Manag..

[18]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[19]  Nada Lavrac,et al.  tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification , 2019, Comput. Speech Lang..

[20]  Saeid Nahavandi,et al.  Extreme learning machine based transfer learning algorithms: A survey , 2017, Neurocomputing.

[21]  Bing Xue,et al.  Cross-Domain Reuse of Extracted Knowledge in Genetic Programming for Image Classification , 2017, IEEE Transactions on Evolutionary Computation.

[22]  Peng Hao,et al.  Transfer learning using computational intelligence: A survey , 2015, Knowl. Based Syst..

[23]  Jimson Mathew,et al.  A framework for semi-supervised metric transfer learning on manifolds , 2019, Knowl. Based Syst..

[24]  Ivor W. Tsang,et al.  Making Trillion Correlations Feasible in Feature Grouping and Selection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Nikhil R. Pal,et al.  Feature Extraction and Selection for Parsimonious Classifiers With Multiobjective Genetic Programming , 2020, IEEE Transactions on Evolutionary Computation.

[26]  Kup-Sze Choi,et al.  Output based transfer learning with least squares support vector machine and its application in bladder cancer prognosis , 2020, Neurocomputing.

[27]  Alper Kursat Uysal,et al.  Improved inverse gravity moment term weighting for text classification , 2019, Expert Syst. Appl..

[28]  Mengjie Zhang,et al.  Transductive Transfer Learning in Genetic Programming for Document Classification , 2017, SEAL.

[29]  Mohammad Saniee Abadeh,et al.  Genetic programming-based feature learning for question answering , 2016, Inf. Process. Manag..

[30]  Marcin Mironczuk,et al.  A recent overview of the state-of-the-art elements of text classification , 2018, Expert Syst. Appl..

[31]  Han Wang,et al.  Relevance popularity: A term event model based feature selection scheme for text classification , 2017, PloS one.

[32]  Jianbin Ma,et al.  A filter-based feature construction and feature selection approach for classification using Genetic Programming , 2020, Knowl. Based Syst..

[33]  Qi Li,et al.  Bag-of-Concepts representation for document classification based on automatic knowledge acquisition from probabilistic knowledge base , 2020, Knowl. Based Syst..

[34]  George D. C. Cavalcanti,et al.  Combining binary classifiers in different dichotomy spaces for text categorization , 2019, Appl. Soft Comput..

[35]  Laurence Hirsch,et al.  EVOLVING TEXT CLASSIFICATION RULES WITH GENETIC PROGRAMMING , 2005, Appl. Artif. Intell..

[36]  Manuel Vilares Ferro,et al.  Wikipedia-based hybrid document representation for textual news classification , 2016, 2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI).

[37]  Jaeyoung Kim,et al.  Text Classification using Capsules , 2018, Neurocomputing.

[38]  Efstathios Stamatatos,et al.  Syntactic N-grams as machine learning features for natural language processing , 2014, Expert Syst. Appl..

[39]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.