Informed Machine Learning - Towards a Taxonomy of Explicit Integration of Knowledge into Machine Learning

Despite the great successes of machine learning, it can have its limits when dealing with insufficient training data.A potential solution is to incorporate additional knowledge into the training process which leads to the idea of informed machine learning. We present a research survey and structured overview of various approaches in this field. We aim to establish a taxonomy which can serve as a classification framework that considers the kind of additional knowledge, its representation,and its integration into the machine learning pipeline. The evaluation of numerous papers on the bases of the taxonomy uncovers key methods in this field.

[1]  Yibo Yang,et al.  Physics-informed deep generative models , 2018, ArXiv.

[2]  Koray Kavukcuoglu,et al.  Exploiting Cyclic Symmetry in Convolutional Neural Networks , 2016, ICML.

[3]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[4]  Hsuan-Tien Lin,et al.  Learning From Data , 2012 .

[5]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[6]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[7]  Nagiza F. Samatova,et al.  Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[8]  Tomaso Poggio,et al.  Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[9]  J. Templeton,et al.  Reynolds averaged turbulence modelling using deep neural networks with embedded invariance , 2016, Journal of Fluid Mechanics.

[10]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[11]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[12]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[13]  Paris Perdikaris,et al.  Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations , 2017, ArXiv.

[14]  Eugenia Kalnay,et al.  Atmospheric Modeling, Data Assimilation and Predictability , 2002 .

[15]  Stefano Ermon,et al.  Label-Free Supervision of Neural Networks with Physics and Domain Knowledge , 2016, AAAI.

[16]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Robert D. Gardner,et al.  Alarm correlation and network fault resolution using the Kohonen self-organising map , 1997, GLOBECOM 97. IEEE Global Telecommunications Conference. Conference Record.

[18]  Eric P. Xing,et al.  Deep Neural Networks with Massive Learned Knowledge , 2016, EMNLP.

[19]  Elliott Mendelson,et al.  Introduction to Mathematical Logic , 1979 .

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Jinlong Wu,et al.  Physics-informed machine learning approach for augmenting turbulence models: A comprehensive framework , 2018, Physical Review Fluids.

[22]  Luciano Serafini,et al.  Neural-Symbolic Computing: An Effective Methodology for Principled Integration of Machine Learning and Reasoning , 2019, FLAP.

[23]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[24]  H. Lowe,et al.  Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. , 1994, JAMA.

[25]  Per Ola Kristensson,et al.  A Review of User Interface Design for Interactive Machine Learning , 2018, ACM Trans. Interact. Intell. Syst..

[26]  Guy Van den Broeck,et al.  A Semantic Loss Function for Deep Learning with Symbolic Knowledge , 2017, ICML.

[27]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  J. Beyerer,et al.  Optimisation of manufacturing process parameters using deep neural networks as surrogate models , 2018 .

[29]  John J. Grefenstette,et al.  Learning Sequential Decision Rules Using Simulation Models and Competition , 1990, Machine Learning.

[30]  Jan Peters,et al.  Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning , 2019, ICLR.

[31]  Linda Zagzebski What is Knowledge , 2017 .

[32]  James R. Foulds,et al.  Joint Models of Disagreement and Stance in Online Debate , 2015, ACL.

[33]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[34]  Jan Nygaard Nielsen,et al.  Parameter estimation in stochastic differential equations: An overview , 2000 .

[35]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[36]  Christian Bauckhage,et al.  Informed Machine Learning Through Functional Composition , 2018, LWDA.

[37]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[38]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[39]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[40]  Marco Gori,et al.  Semantic-based regularization for learning and inference , 2017, Artif. Intell..

[41]  InVis: A Tool for Interactive Visual Data Analysis , 2013, ECML/PKDD.

[42]  Tom M. Mitchell,et al.  Learning Pipelines with Limited Data and Domain Knowledge: A Study in Parsing Physics Problems , 2018, NeurIPS.

[43]  Zhijian Liu,et al.  Learning to Exploit Stability for 3D Scene Parsing , 2018, NeurIPS.

[44]  Aidong Zhang,et al.  Multi-view Factorization AutoEncoder with Network Constraints for Multi-omic Integrative Analysis , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[45]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[46]  Gregor Kasieczka,et al.  Deep-learned Top Tagging with a Lorentz Layer , 2017, SciPost Physics.

[47]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[48]  Kristian Kersting,et al.  Markov Logic Mixtures of Gaussian Processes: Towards Machines Reading Regression Data , 2012, AISTATS.

[49]  Anubhav Jain,et al.  Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory , 2010 .

[50]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[51]  Min Chen,et al.  VIS4ML: An Ontology for Visual Analytics Assisted Machine Learning , 2019, IEEE Transactions on Visualization and Computer Graphics.

[52]  Xiaoyan Zhu,et al.  Commonsense Knowledge Aware Conversation Generation with Graph Attention , 2018, IJCAI.

[53]  Marco Gori,et al.  Learning Efficiently in Semantic Based Regularization , 2016, ECML/PKDD.

[54]  Matthew Richardson,et al.  Learning with Knowledge from Multiple Experts , 2003, ICML.

[55]  Ron Sun,et al.  Connectionist Implementationalism and Hybrid Systems , 2006 .

[56]  Tom M. Mitchell,et al.  Explanation-Based Generalization: A Unifying View , 1986, Machine Learning.

[57]  Eric P. Xing,et al.  Symbolic Graph Reasoning Meets Convolutions , 2018, NeurIPS.

[58]  Marco Gori,et al.  Integrating Prior Knowledge into Deep Learning , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[59]  Giorgos Borboudakis,et al.  Incorporating Causal Prior Knowledge as Path-Constraints in Bayesian Networks and Maximal Ancestral Graphs , 2012, ICML.

[60]  Chris Sauer,et al.  Beating Atari with Natural Language Guided Reinforcement Learning , 2017, ArXiv.

[61]  Cao Xiao,et al.  Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders , 2018, NeurIPS.

[62]  Marlon Núñez,et al.  The Use of Background Knowledge in Decision Tree Induction , 1991, Machine Learning.

[63]  William Marsh,et al.  Not just data: A method for improving prediction with knowledge , 2014, J. Biomed. Informatics.

[64]  Paris Perdikaris,et al.  Physics-Constrained Deep Learning for High-dimensional Surrogate Modeling and Uncertainty Quantification without Labeled Data , 2019, J. Comput. Phys..

[65]  Tiansi Dong,et al.  Imposing Category Trees Onto Word-Embeddings Using A Geometric Construction , 2018, ICLR.

[66]  Benjamin Peherstorfer,et al.  Projection-based model reduction: Formulations for physics-based machine learning , 2019, Computers & Fluids.

[67]  Andrés R. Masegosa,et al.  International Journal of Approximate Reasoning , 2022 .

[68]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[69]  Abhinav Gupta,et al.  The More You Know: Using Knowledge Graphs for Image Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Steven Y. Liang,et al.  Physics-Embedded Machine Learning: Case Study with Electrochemical Micro-Machining , 2017 .

[71]  Andy J. Keane,et al.  A Knowledge-Based Approach To Response Surface Modelling in Multifidelity Optimization , 2003, J. Glob. Optim..

[72]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[73]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[74]  William Marsh,et al.  Combining data and meta-analysis to build Bayesian networks for clinical decision support , 2014, J. Biomed. Informatics.

[75]  Joshua B. Tenenbaum,et al.  End-to-End Differentiable Physics for Learning and Control , 2018, NeurIPS.

[76]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[77]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[78]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[79]  Ribana Roscher,et al.  Explainable Machine Learning for Scientific Insights and Discoveries , 2019, IEEE Access.

[80]  Debora S. Marks,et al.  Learning Protein Structure with a Differentiable Simulator , 2018, ICLR.

[81]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[82]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[83]  Jie Li,et al.  SPIGAN: Privileged Adversarial Learning from Simulation , 2018, ICLR.

[84]  Barbara Solenthaler,et al.  Data-driven fluid simulations using regression forests , 2015, ACM Trans. Graph..

[85]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[86]  Ramakrishna Tipireddy,et al.  Physics-informed Machine Learning Method for Forecasting and Uncertainty Quantification of Partially Observed and Unobserved States in Power Grids , 2018, HICSS.

[87]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[88]  Michael Silberstein,et al.  The Blackwell Guide to the Philosophy of Science , 2002 .

[89]  Michael Chertkov,et al.  From Deep to Physics-Informed Learning of Turbulence: Diagnostics , 2018, ArXiv.

[90]  Ming-Wei Chang,et al.  Structured learning with constrained conditional models , 2012, Machine Learning.

[91]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[92]  Linda C. van der Gaag,et al.  Learning Bayesian network parameters under order constraints , 2006, Int. J. Approx. Reason..

[93]  A. N. Kolmogorov,et al.  Foundations of the theory of probability , 1960 .

[94]  Artur S. d'Avila Garcez,et al.  Fast relational learning using bottom clause propositionalization with artificial neural networks , 2013, Machine Learning.

[95]  Luis M. de Campos,et al.  Bayesian network learning algorithms using structural restrictions , 2007, Int. J. Approx. Reason..

[96]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[97]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[98]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[99]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[100]  Daniel Paurat,et al.  Interactive Knowledge-Based Kernel PCA , 2014, ECML/PKDD.

[101]  Colin J. Cotter,et al.  Probabilistic Forecasting and Bayesian Data Assimilation , 2015 .

[102]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[103]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[104]  Melvin Fitting,et al.  First-Order Logic and Automated Theorem Proving , 1990, Graduate Texts in Computer Science.

[105]  Christian Bauckhage,et al.  Leveraging Domain Knowledge for Reinforcement Learning Using MMC Architectures , 2019, ICANN.

[106]  Le Song,et al.  GRAM: Graph-based Attention Model for Healthcare Representation Learning , 2016, KDD.

[107]  Deng Cai,et al.  Deep Rotation Equivariant Network , 2017, Neurocomputing.

[108]  Anuj Karpatne,et al.  Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling , 2017, ArXiv.

[109]  Marvin Minsky,et al.  Logical Versus Analogical or Symbolic Versus Connectionist or Neat Versus Scruffy , 1991, AI Mag..

[110]  Jinlong Wu,et al.  Physics-informed machine learning approach for reconstructing Reynolds stress modeling discrepancies based on DNS data , 2016, 1606.07987.

[111]  Yan Liu,et al.  Deep Computational Phenotyping , 2015, KDD.

[112]  J. Nathan Kutz,et al.  Deep learning in fluid dynamics , 2017, Journal of Fluid Mechanics.

[113]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[114]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[115]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[116]  Zhen-Hua Ling,et al.  Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions , 2019, NAACL.

[117]  Carla E. Brodley,et al.  Dis-function: Learning distance functions interactively , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[118]  Stephen H. Bach,et al.  Hinge-Loss Markov Random Fields and Probabilistic Soft Logic , 2015, J. Mach. Learn. Res..

[119]  Daniel L. K. Yamins,et al.  Flexible Neural Representation for Physics Prediction , 2018, NeurIPS.

[120]  Stefano Ermon,et al.  Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery , 2014, AAAI.

[121]  Norman Fenton,et al.  Integrating Expert Knowledge with Data in Bayesian Networks: Preserving Data-Driven Expectations when the Expert Variables Remain Unobserved , 2016, Expert Syst. Appl..

[122]  Artur S. d'Avila Garcez,et al.  The Connectionist Inductive Learning and Logic Programming System , 1999, Applied Intelligence.

[123]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[124]  Patrick Gallinari,et al.  Deep learning for physical processes: incorporating prior scientific knowledge , 2017, ICLR.

[125]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[126]  James Cussens,et al.  Bayesian learning of Bayesian networks with informative priors , 2008, Annals of Mathematics and Artificial Intelligence.

[127]  Frédéric Chazal,et al.  An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists , 2017, Frontiers in Artificial Intelligence.

[128]  Stefan Wermter,et al.  Hybrid neural systems: from simple coupling to fully integrated neural networks , 1999 .

[129]  Doron L. Bergman,et al.  Symmetry constrained machine learning , 2018, IntelliSys.

[130]  Kristian Kersting,et al.  Right for the Wrong Scientific Reasons: Revising Deep Networks by Interacting with their Explanations , 2020, ArXiv.

[131]  Johannes Fürnkranz,et al.  A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..

[132]  Jie Lin,et al.  Object Detection Meets Knowledge Graphs , 2017, IJCAI.

[133]  Liang Lin,et al.  Hybrid Knowledge Routed Modules for Large-scale Object Detection , 2018, NeurIPS.

[134]  Lise Getoor,et al.  A short introduction to probabilistic soft logic , 2012, NIPS 2012.

[135]  Lyle H. Ungar,et al.  A hybrid neural network‐first principles approach to process modeling , 1992 .