An ant colony-based semi-supervised approach for learning classification rules

Semi-supervised learning methods create models from a few labeled instances and a great number of unlabeled instances. They appear as a good option in scenarios where there is a lot of unlabeled data and the process of labeling instances is expensive, such as those where most Web applications stand. This paper proposes a semi-supervised self-training algorithm called Ant-Labeler. Self-training algorithms take advantage of supervised learning algorithms to iteratively learn a model from the labeled instances and then use this model to classify unlabeled instances. The instances that receive labels with high confidence are moved from the unlabeled to the labeled set, and this process is repeated until a stopping criteria is met, such as labeling all unlabeled instances. Ant-Labeler uses an ACO algorithm as the supervised learning method in the self-training procedure to generate interpretable rule-based models—used as an ensemble to ensure accurate predictions. The pheromone matrix is reused across different executions of the ACO algorithm to avoid rebuilding the models from scratch every time the labeled set is updated. Results showed that the proposed algorithm obtains better predictive accuracy than three state-of-the-art algorithms in roughly half of the datasets on which it was tested, and the smaller the number of labeled instances, the better the Ant-Labeler performance.

[1]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[4]  Zhi-Hua Zhou,et al.  Improve Computer-Aided Diagnosis With Machine , 2007 .

[5]  Shih-Fu Chang,et al.  Semi-supervised learning using greedy max-cut , 2013, J. Mach. Learn. Res..

[6]  Ashish Ghosh,et al.  Ant Based Semi-supervised Classification , 2010, ANTS Conference.

[7]  Shaoning Pang,et al.  Transductive support vector machines and applications in bioinformatics for promoter recognition , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[8]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[9]  Francisco Herrera,et al.  Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.

[10]  Xian-Sheng Hua,et al.  Transductive multi-label learning for video concept detection , 2008, MIR '08.

[11]  Alex Alves Freitas,et al.  A New Sequential Covering Strategy for Inducing Classification Rules With Ant Colony Algorithms , 2013, IEEE Transactions on Evolutionary Computation.

[12]  Ping He,et al.  Semi-supervised Classification with Multiple Ants Maximal Spanning Tree , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[13]  Bart Baesens,et al.  Editorial survey: swarm intelligence for data mining , 2010, Machine Learning.

[14]  Danai Koutra,et al.  Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms , 2011, ECML/PKDD.

[15]  Cedric E. Ginestet Semisupervised Learning for Computational Linguistics , 2009 .

[16]  Alex Alves Freitas,et al.  cAnt-Miner: An Ant Colony Classification Algorithm to Cope with Continuous Attributes , 2008, ANTS Conference.

[17]  Fei Wang,et al.  Cuts3vm: a fast semi-supervised svm algorithm , 2008, KDD.

[18]  Ashish Ghosh,et al.  Aggregation pheromone metaphor for semi-supervised classification , 2013, Pattern Recognit..

[19]  Sebastián Ventura,et al.  An Automatic Programming ACO-Based Algorithm for Classification Rule Mining , 2010, PAAMS.

[20]  Daniel Angus,et al.  Niching ant colony optimisation , 2008 .

[21]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[22]  S. S. Ravi,et al.  Clustering with Constraints: Feasibility Issues and the k-Means Algorithm , 2005, SDM.

[23]  Gisele L. Pappa,et al.  Semi-supervised genetic programming for classification , 2011, GECCO '11.

[24]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[25]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[26]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[27]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[28]  Zhi-Hua Zhou,et al.  Towards Making Unlabeled Data Never Hurt , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.