论文信息 - Concept Sampling: Towards Systematic Selection in Large-Scale Mixed Concepts in Machine Learning

Concept Sampling: Towards Systematic Selection in Large-Scale Mixed Concepts in Machine Learning

This paper addresses the problem of concept sampling. In many real-world applications, a large collection of mixed concepts is available for decision making. However, the collection is often so large that it is difficult if not unrealistic to utilize those concepts directly, due to the domain-specific limitations of available space or time. This naturally yields the need for concept reduction. In this paper, we introduce the novel problem of concept sampling: to find the optimal subset of a large collection of mixed concepts in advance so that the performance of future decision making can be best preserved by selectively combining the concepts remained in the subset. The problem is formulized as an optimization process based on our derivation of a target function, which ties a clear connection between the composition of the concept subset and the expected error of future decision making upon the subset. Then, based on this target function, a sampling algorithm is developed and its effectiveness is discussed. Extensive empirical studies suggest that, the proposed concept sampling method well preserves the performance of decision making while dramatically reduces the number of concepts maintained and thus justify its usefulness in handling large-scale mixed concepts.

Xiaoming Jin | Yi Zhang | Yi Zhang | Xiaoming Jin

[1] Jeffrey Scott Vitter,et al. Random sampling with a reservoir , 1985, TOMS.

[2] C. A. Murthy,et al. Density-Based Multiscale Data Condensation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[4] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[5] Philip S. Yu,et al. Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[6] Tommy W. S. Chow,et al. Enhancing Density-Based Data Reduction Using Entropy , 2006, Neural Computation.

[7] Claude Sammut,et al. Extracting Hidden Context , 1998, Machine Learning.

[8] Gerhard Widmer,et al. Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[9] Tony R. Martinez,et al. Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[10] Ludmila I. Kuncheva,et al. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[11] William Nick Street,et al. A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.