Less bits is more: How pruning deep binary networks increases weight capacity
暂无分享,去创建一个
Binary networks are extremely efficient as multiplications and additions are replaced by bit shifts. Yet, binarizing network models reduces their representational power: a binary weight with a value of -1 or +1 cannot represent as much information as a real weight. We make the observation that pruning weights adds the value 0 as an additional symbol and thus increases the information capacity of the network. This increases the solution space of our network -- more network configurations are possible. Thus far, all hypothesis are considered equally likely. Yet, given that the network is binary, by assuming a Bernoulli prior over the weights, we restricts the hypothesis space to only the ones that can be effectively encoded in a binary network. We show that this view leads to maximizing the information capacity over the binary weights. In this work we propose to jointly prune binary weights and maximize the information capacity, thus finding a subnetwork that performs better than the original network. On 3 datasets and 11 architectures we show compact models with good accuracy comparing favorably to state-of-the-art.