Algebraic Machine Learning

Machine learning algorithms use error function minimization to fit a large set of parameters in a preexisting model. However, error minimization eventually leads to a memorization of the training dataset, losing the ability to generalize to other datasets. To achieve generalization something else is needed, for example a regularization method or stopping the training when error in a validation dataset is minimal. Here we propose a different approach to learning and generalization that is parameter-free, fully discrete and that does not use function minimization. We use the training data to find an algebraic representation with minimal size and maximal freedom, explicitly expressed as a product of irreducible components. This algebraic representation is shown to directly generalize, giving high accuracy in test data, more so the smaller the representation. We prove that the number of generalizing representations can be very large and the algebra only needs to find one. We also derive and test a relationship between compression and error rate. We give results for a simple problem solved step by step, hand-written character recognition, and the Queens Completion problem as an example of unsupervised learning. As an alternative to statistical learning, algebraic learning may offer advantages in combining bottom-up and top-down information, formal concept derivation from data and large-scale parallelization.

[1]  Frano Skopljanac-Macina,et al.  Formal Concept Analysis – Overview and Applications , 2014 .

[2]  Stanley Burris,et al.  A course in universal algebra , 1981, Graduate texts in mathematics.

[3]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[4]  Stevo Todorcevic,et al.  Introduction to Ramsey Spaces , 2010 .

[5]  Peter J. Stuckey,et al.  Programming with Constraints: An Introduction , 1998 .

[6]  Dona Papert,et al.  Congruence Relations in Semi‐Lattices , 1964 .

[7]  J. Bullier Integrated model of visual processing , 2001, Brain Research Reviews.

[8]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Ian P. Gent,et al.  Complexity of n-Queens Completion , 2017, J. Artif. Intell. Res..

[11]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[12]  PETE L. CLARK,et al.  COURSE ON MODEL THEORY , 2012 .

[13]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[14]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[15]  Marc Pouly,et al.  Generic Inference: A Unifying Theory for Automated Reasoning , 2011 .

[16]  Nils J. Nilsson,et al.  Logic and Artificial Intelligence , 1991, Artif. Intell..