Concept Sampling: Towards Systematic Selection in Large-Scale Mixed Concepts in Machine Learning

This paper addresses the problem of concept sampling. In many real-world applications, a large collection of mixed concepts is available for decision making. However, the collection is often so large that it is difficult if not unrealistic to utilize those concepts directly, due to the domain-specific limitations of available space or time. This naturally yields the need for concept reduction. In this paper, we introduce the novel problem of concept sampling: to find the optimal subset of a large collection of mixed concepts in advance so that the performance of future decision making can be best preserved by selectively combining the concepts remained in the subset. The problem is formulized as an optimization process based on our derivation of a target function, which ties a clear connection between the composition of the concept subset and the expected error of future decision making upon the subset. Then, based on this target function, a sampling algorithm is developed and its effectiveness is discussed. Extensive empirical studies suggest that, the proposed concept sampling method well preserves the performance of decision making while dramatically reduces the number of concepts maintained and thus justify its usefulness in handling large-scale mixed concepts.