Minimizing Impurity Partition Under Constraints

Set partitioning is a key component of many algorithms in machine learning, signal processing, and communications. In general, the problem of finding a partition that minimizes a given impurity (loss function) is NP-hard. As such, there exists a wealth of literature on approximate algorithms and theoretical analyses of the partitioning problem under different settings. In this paper, we formulate and solve a variant of the partition problem called the minimum impurity partition under constraint (MIPUC). MIPUC finds an optimal partition that minimizes a given loss function under a given concave constraint. MIPUC generalizes the recently proposed deterministic information bottleneck problem which finds an optimal partition that maximizes the mutual information between the input and partition output while minimizing the partition output entropy. Our proposed algorithm is developed based on a novel optimality condition, which allows us to find a locally optimal solution efficiently. Moreover, we show that the optimal partition produces a hard partition that is equivalent to the cuts by hyperplanes in the probability space of the posterior probability that finally yields a polynomial time complexity algorithm to find the globally optimal partition. Both theoretical and numerical results are provided to validate the proposed algorithm.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  Brian M. Kurkoski,et al.  Quantization of Binary-Input Discrete Memoryless Channels , 2011, IEEE Transactions on Information Theory.

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  David J. Schwab,et al.  The Deterministic Information Bottleneck , 2015, Neural Computation.

[5]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[6]  A. Nadas,et al.  An iterative 'flip-flop' approximation of the most informative split in the construction of decision trees , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Marco Molinaro,et al.  Binary Partitions with Approximate Minimum Impurity , 2018, ICML.

[8]  Sergio Verdú,et al.  On channel capacity per unit cost , 1990, IEEE Trans. Inf. Theory.

[9]  V. D. Pietra,et al.  Minimum Impurity Partitions , 1992 .

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  Thinh Nguyen,et al.  On the Capacities of Discrete Memoryless Thresholding Channels , 2018, 2018 IEEE 87th Vehicular Technology Conference (VTC Spring).

[12]  Jonathan R. M. Hosking,et al.  Partitioning Nominal Attributes in Decision Trees , 1999, Data Mining and Knowledge Discovery.

[13]  Brian M. Kurkoski,et al.  Decoding LDPC codes with mutual information-maximizing lookup tables , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[14]  Alexander Vardy,et al.  How to Construct Polar Codes , 2011, IEEE Transactions on Information Theory.

[15]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..