Co-training Based Attribute Reduction for Partially Labeled Data

Rough set theory is an effective supervised learning model for labeled data. However, it is often the case that practical problems involve both labeled and unlabeled data. In this paper, the problem of attribute reduction for partially labeled data is studied. A novel semi-supervised attribute reduction algorithm is proposed, based on co-training which capitalizes on the unlabeled data to improve the quality of attribute reducts from few labeled data. It gets two diverse reducts of the labeled data, employs them to train its base classifiers, then co-trains the two base classifiers iteratively. In every round, the base classifiers learn from each other on the unlabeled data and enlarge the labeled data, so better quality reducts could be computed from the enlarged labeled data and employed to construct base classifiers of higher performance. The experimental results with UCI data sets show that the proposed algorithm can improves the quality of reduct.

[1]  L. Polkowski Rough Sets: Mathematical Foundations , 2013 .

[2]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[3]  Siwei Luo,et al.  A random subspace method for co-training , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[4]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[5]  Yan Zhou,et al.  Democratic co-learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[6]  Zehra Cataltepe,et al.  Co-training with relevant random subspaces , 2010, Neurocomputing.

[7]  Jon Atli Benediktsson,et al.  Multiple Classifier Systems , 2015, Lecture Notes in Computer Science.

[8]  Tang Huanling,et al.  An Advanced Co-Training Algorithm Based on Mutual Independence and Diversity Measures , 2008 .

[9]  Irena Koprinska,et al.  Co-training using RBF Nets and Different Feature Splits , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[10]  Z. Pawlak Reasoning about Knowledge , 1991 .

[11]  Miao Duo A Semi-Supervised Rough Set Model for Classification Based on Active Learning and Co-Training , 2012 .

[12]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[13]  Roman Słowiński,et al.  Intelligent Decision Support , 1992, Theory and Decision Library.

[14]  Lambert Schomaker,et al.  Indoor localization by denoising autoencoders and semi-supervised learning in 3D simulated environment , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[15]  Zhi-Hua Zhou,et al.  Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[16]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[17]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[18]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Zhifei Zhang,et al.  International Journal of Approximate Reasoning Diverse Reduct Subspaces Based Co-training for Partially Labeled Data , 2022 .

[20]  Yiyu Yao,et al.  Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model , 2009, Inf. Sci..

[21]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[22]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[23]  K. Thangavel,et al.  Dimensionality reduction based on rough set theory: A review , 2009, Appl. Soft Comput..

[24]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[25]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[26]  Neamat El Gayar,et al.  New Feature Splitting Criteria for Co-training Using Genetic Algorithm Optimization , 2010, MCS.