From machine learning to machine reasoning

A plausible definition of “reasoning” could be “algebraically manipulating previously acquired knowledge in order to answer a new question”. This definition covers first-order logical inference or probabilistic inference. It also includes much simpler manipulations commonly used to build large learning systems. For instance, we can build an optical character recognition system by first training a character segmenter, an isolated character recognizer, and a language model, using appropriate labelled training sets. Adequately concatenating these modules and fine tuning the resulting system can be viewed as an algebraic operation in a space of models. The resulting model answers a new question, that is, converting the image of a text page into a computer readable text.This observation suggests a conceptual continuity between algebraically rich inference systems, such as logical or probabilistic inference, and simple manipulations, such as the mere concatenation of trainable learning systems. Therefore, instead of trying to bridge the gap between machine learning systems and sophisticated “all-purpose” inference mechanisms, we can instead algebraically enrich the set of manipulations applicable to training systems, and build reasoning capabilities from the ground up.

[1]  W. Ackermann,et al.  Grundzuge der Theoretischen Logik , 1928 .

[2]  L. M.-T. Grundzüge der theoretischen Logik , 1929, Nature.

[3]  K. Popper,et al.  The Logic of Scientific Discovery , 1960 .

[4]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[5]  R. Lazarus Subception: fact or artifact? A reply to Eriksen. , 1956, Psychological review.

[6]  M. Kendall,et al.  The Logic of Scientific Discovery. , 1959 .

[7]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[8]  J. A. Robinson,et al.  A Machine-Oriented Logic Based on the Resolution Principle , 1965, JACM.

[9]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[10]  J. Piaget La construction du réel chez l'enfant , 1973 .

[11]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[12]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[13]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[14]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[15]  Geoffrey E. Hinton Mapping Part-Whole Hierarchies into Connectionist Networks , 1990, Artif. Intell..

[16]  Patrick Gallinari,et al.  A Framework for the Cooperation of Learning Algorithms , 1990, NIPS.

[17]  E. HintonGeoffrey Mapping part-whole hierarchies into connectionist networks , 1990 .

[18]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[19]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[20]  Alessandro Sperduti,et al.  Encoding Labeled Graphs by Labeling RAAM , 1993, NIPS.

[21]  G. A. Miller The magical number seven plus or minus two: some limits on our capacity for processing information. , 1956, Psychological review.

[22]  A. Sperduti,et al.  Encoding pyramids by Labeling RAAM , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[23]  Geoffrey E. Hinton,et al.  Distributed representations and nested compositional structure , 1994 .

[24]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[25]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[26]  C. Lee Giles,et al.  Neural Information Processing Systems 7 , 1995 .

[27]  D. Povinelli,et al.  Mindblindness. An Essay on Autism and Theory of Mind Simon Baron-Cohen 1995 , 1996, Trends in Neurosciences.

[28]  Yoshua Bengio,et al.  Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Dan Roth,et al.  Learning to reason , 1994, JACM.

[30]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[31]  S. Baron-Cohen Mindblindness: An Essay on Autism and Theory of Mind , 1997 .

[32]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[34]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[35]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  Geoffrey E. Hinton,et al.  Learning Hierarchical Structures with Linear Relational Embedding , 2001, NIPS.

[38]  T. Poggio,et al.  How Visual Cortex Recognizes Objects: The Tale of the Standard Model , 2002 .

[39]  Jennifer Neville,et al.  Collective Classification with Relational Dependency Networks , 2003 .

[40]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[41]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[42]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[43]  Luis von Ahn Games with a Purpose , 2006, Computer.

[44]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[45]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[46]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[47]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[48]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[49]  Fast Semantic Extraction Using a Novel Neural Network Architecture , 2007, ACL.

[50]  Johan van Benthem,et al.  Handbook of Spatial Logics , 2007 .

[51]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from a Single Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[52]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[53]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[54]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[55]  Alexei A. Efros,et al.  Closing the loop in scene interpretation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  L. Bottou,et al.  Deep Convolutional Networks for Scene Parsing , 2009 .

[57]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.

[58]  SapiroGuillermo,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2010 .

[59]  Austin J. Brockmeier,et al.  Advances in Neural Information Processing Systems 23 (NIPS 2010) , 2010 .

[60]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[61]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[62]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[63]  Etter Vincent,et al.  Semantic Vector Machines , 2011, 1105.2868.

[64]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[65]  Vincent Etter Semantic Vector Machines , 2011, ArXiv.

[66]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[67]  Ronan Collobert,et al.  Deep Learning for Efficient Discriminative Parsing , 2011, AISTATS.

[68]  R. Needham,et al.  Artificial Intelligence : A General Survey , 2012 .

[69]  L. Bottou From machine learning to machine reasoning An essay , 2013 .