Proposal and Empirical Comparison of a Parallelizable Distance-Based Discretization Method

Many classification algorithms are designed to work with datasets that contain only discrete attributes. Discretization is the process of converting the continuous attributes of the dataset into discrete ones in order to apply some classification algorithm. In this paper we first review previous work in discretization, then we propose a new discretization method based on a distance proposed by Lopez de Mantaras and show that it can be easily implemented in parallel, with a high improvement in its complexity. Finally we empirically show that our method has an excellent performance compared with other state-of-the-art methods.