Towards Neural Theorem Proving at Scale

Neural models combining representation learning and reasoning in an end-to-end trainable manner are receiving increasing interest. However, their use is severely limited by their computational complexity, which renders them unusable on real world datasets. We focus on the Neural Theorem Prover (NTP) model proposed by Rockt{\"{a}}schel and Riedel (2017), a continuous relaxation of the Prolog backward chaining algorithm where unification between terms is replaced by the similarity between their embedding representations. For answering a given query, this model needs to consider all possible proof paths, and then aggregate results - this quickly becomes infeasible even for small Knowledge Bases (KBs). We observe that we can accurately approximate the inference process in this model by considering only proof paths associated with the highest proof scores. This enables inference and learning on previously impracticable KBs.

[1]  Alexandr Andoni,et al.  Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[2]  Andrew W. Moore,et al.  Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-Trees , 1998, NIPS.

[3]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Luc De Raedt,et al.  Probabilistic inductive logic programming , 2004 .

[5]  Madan M. Gupta,et al.  Fuzzy Computing: Theory, Hardware, and Applications , 1988 .

[6]  Xuemin Lin,et al.  Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement , 2016, IEEE Transactions on Knowledge and Data Engineering.

[7]  Tim Rocktäschel,et al.  Programming with a Differentiable Forth Interpreter , 2016, ICML.

[8]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[9]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[10]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[11]  Towards Neural Theorem Proving at Scale Anonymous , 2018 .

[12]  Yury A. Malkov,et al.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Vladimir Krylov,et al.  Approximate nearest neighbor algorithm based on navigable small world graphs , 2014, Inf. Syst..

[14]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[15]  Richard Evans,et al.  Learning Explanatory Rules from Noisy Data , 2017, J. Artif. Intell. Res..

[16]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[17]  Gary Marcus,et al.  Deep Learning: A Critical Appraisal , 2018, ArXiv.

[18]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[20]  Dan Klein,et al.  Learning to Compose Neural Networks for Question Answering , 2016, NAACL.

[21]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[22]  Liya Ding,et al.  A Prolog-like inference system based on neural logic - An attempt towards fuzzy neural logic programming , 1996, Fuzzy Sets Syst..

[23]  Pushmeet Kohli,et al.  TerpreT: A Probabilistic Programming Language for Program Induction , 2016, ArXiv.

[24]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[25]  Tim Rocktäschel,et al.  End-to-end Differentiable Proving , 2017, NIPS.

[26]  Andrew Y. Ng,et al.  Fast Gaussian Process Regression using KD-Trees , 2005, NIPS.

[27]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[28]  Marcin Andrychowicz,et al.  Learning Efficient Algorithms with Hierarchical Attentive Memory , 2016, ArXiv.

[29]  Alex Graves,et al.  Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[30]  Jian Sun,et al.  Optimized Product Quantization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Leonid Boytsov,et al.  Engineering Efficient and Effective Non-metric Space Library , 2013, SISAP.

[32]  Lihong Li,et al.  Neuro-Symbolic Program Synthesis , 2016, ICLR.

[33]  Krysia Broda,et al.  Neural-symbolic learning systems - foundations and applications , 2012, Perspectives in neural computing.