Boolean autoencoders and hypercube clustering complexity

We introduce and study the properties of Boolean autoencoder circuits. In particular, we show that the Boolean autoencoder circuit problem is equivalent to a clustering problem on the hypercube. We show that clustering m binary vectors on the n-dimensional hypercube into k clusters is NP-hard, as soon as the number of clusters scales like $${m^\epsilon (\epsilon >0 )}$$ , and thus the general Boolean autoencoder problem is also NP-hard. We prove that the linear Boolean autoencoder circuit problem is also NP-hard, and so are several related problems such as: subspace identification over finite fields, linear regression over finite fields, even/odd set intersections, and parity circuits. The emerging picture is that autoencoder optimization is NP-hard in the general case, with a few notable exceptions including the linear cases over infinite fields or the Boolean case with fixed size hidden layer. However learning can be tackled by approximate algorithms, including alternate optimization, suggesting a new class of learning algorithms for deep networks, including deep networks of threshold gates or artificial neurons.

[1]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[2]  Geoffrey E. Hinton,et al.  Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[3]  Robert McEliece,et al.  The Theory of Information and Coding: Information theory , 2002 .

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[6]  James R. Slagle,et al.  A Clustering and Data-Reorganizing Algorithm , 1975, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  F. Harary Cubical graphs and cubical dimensions , 1988 .

[8]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[9]  Pierre Baldi,et al.  Complex-Valued Autoencoders , 2011, Neural Networks.

[10]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[11]  Peter Winkler,et al.  Proof of the squashed cube conjecture , 1983, Comb..

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[14]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[15]  Alexander Vardy,et al.  The intractability of computing the minimum distance of a code , 1997, IEEE Trans. Inf. Theory.

[16]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .

[17]  Nimrod Megiddo,et al.  On the Complexity of Some Common Geometric Location Problems , 1984, SIAM J. Comput..

[18]  I. Havel,et al.  $B$-valuations of graphs , 1972 .

[19]  S. Louis Hakimi,et al.  On the complexity of some coding problems , 1981, IEEE Trans. Inf. Theory.

[20]  Jehuda Hartman The homeomorphic embedding of Kn in the m-cube , 1976, Discret. Math..

[21]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[22]  Andrea Vattani The hardness of k-means clustering in the plane , 2010 .

[23]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[24]  George Papageorgiou,et al.  The Complexity of Cubical Graphs , 1985, Inf. Control..

[25]  Jehuda Hartman The homeomorphic embedding of K n in the m-cube. , 1976 .

[26]  Elwyn R. Berlekamp,et al.  On the inherent intractability of certain coding problems (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[27]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[28]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[29]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[30]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[31]  M. Livingston,et al.  Embeddings in hypercubes , 1988 .

[32]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.