论文信息 - Properties of Minimizing Entropy

Properties of Minimizing Entropy

Compact data representations are one approach for improving generalization of learned functions. We explicitly illustrate the relationship between entropy and cardinality, both measures of compactness, including how gradient descent on the former reduces the latter. Whereas entropy is distribution sensitive, cardinality is not. We propose a third compactness measure that is a compromise between the two: expected cardinality, or the expected number of unique states in any finite number of draws, which is more meaningful than standard cardinality as it discounts states with negligible probability mass. We show that minimizing entropy also minimizes expected cardinality.

[1] Yoshua Bengio,et al. Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization , 2021, ArXiv.

[2] Unpacking Information Bottlenecks: Surrogate Objec- tives for Deep Learning , 2020 .

[3] Shie Mannor,et al. Robustness and generalization , 2010, Machine Learning.

[4] Razvan Pascanu,et al. Test Sample Accuracy Scales with Training Sample Density in Neural Networks , 2021 .

[5] Yoshua Bengio,et al. Inductive Biases for Deep Learning of Higher-Level Cognition , 2020, ArXiv.

[6] Jinwoo Shin,et al. CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances , 2020, NeurIPS.

[7] Raef Bassily,et al. Learners that Use Little Information , 2017, ALT.

[8] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[9] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.