Propositionalization and embeddings: two sides of the same coin

Data preprocessing is an important component of machine learning pipelines, which requires ample time and resources. An integral part of preprocessing is data transformation into the format required by a given learning algorithm. This paper outlines some of the modern data processing techniques used in relational learning that enable data fusion from different input data types and formats into a single table data representation, focusing on the propositionalization and embedding data transformation approaches. While both approaches aim at transforming data into tabular data format, they use different terminology and task definitions, are perceived to address different goals, and are used in different contexts. This paper contributes a unifying framework that allows for improved understanding of these two data transformation techniques by presenting their unified definitions, and by explaining the similarities and differences between the two approaches as variants of a unified complex data transformation task. In addition to the unifying framework, the novelty of this paper is a unifying methodology combining propositionalization and embeddings, which benefits from the advantages of both in solving complex data transformation and learning tasks. We present two efficient implementations of the unifying methodology: an instance-based PropDRM approach, and a feature-based PropStar approach to data transformation and learning, together with their empirical evaluation on several relational problems. The results show that the new algorithms can outperform existing relational learners and can solve much larger problems.

[1]  Nada Lavrač,et al.  Relational Data Mining , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[2]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[3]  Saěso Dězeroski Relational Data Mining , 2001, Encyclopedia of Machine Learning and Data Mining.

[4]  Li Guo,et al.  Jointly Embedding Knowledge Graphs and Logical Rules , 2016, EMNLP.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[9]  Vítor Santos Costa,et al.  Inductive Logic Programming , 2013, Lecture Notes in Computer Science.

[10]  Huma Lodhi,et al.  Deep Relational Machines , 2013, ICONIP.

[11]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[12]  Jun Zhao,et al.  Learning to Represent Knowledge Graphs with Gaussian Embedding , 2015, CIKM.

[13]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[14]  Peter A. Flach,et al.  Comparative Evaluation of Approaches to Propositionalization , 2003, ILP.

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  Nicolas Lachiche,et al.  Flexible propositionalization of continuous attributes in relational data mining , 2015, Expert Syst. Appl..

[17]  Lovekesh Vig,et al.  Logical Explanations for Deep Relational Machines Using Relevance Information , 2018, J. Mach. Learn. Res..

[18]  Jason Weston,et al.  A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[19]  Dan Roth,et al.  On Kernel Methods for Relational Learning , 2003, ICML.

[20]  Stefan Wrobel,et al.  Transformation-Based Learning Using Multirelational Aggregation , 2001, ILP.

[21]  Xiangxiang Zeng,et al.  Prediction of Drug–Gene Interaction by Using Metapath2vec , 2018, Front. Genet..

[22]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[23]  Marco Zaffalon,et al.  Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis , 2016, J. Mach. Learn. Res..

[24]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[25]  Lovekesh Vig,et al.  Large-Scale Assessment of Deep Relational Machines , 2018, ILP.

[26]  Nada Lavrac,et al.  A Wordification Approach to Relational Data Mining , 2013, Discovery Science.

[27]  Peter A. Flach,et al.  1BC2: A True First-Order Bayesian Classifier , 2002, ILP.

[28]  Luc De Raedt,et al.  Logical and relational learning , 2008, Cognitive Technologies.

[29]  Nada Lavrac,et al.  A Methodology for Mining Document-Enriched Heterogeneous Information Networks , 2011, Comput. J..

[30]  Sameer Singh,et al.  Injecting Logical Background Knowledge into Embeddings for Relation Extraction , 2015, NAACL.

[31]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[32]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[33]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[34]  Nada Lavrac,et al.  Semantic subgroup discovery: Using ontologies in microarray data analysis , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[35]  Thomas Demeester,et al.  Lifted Rule Injection for Relation Embeddings , 2016, EMNLP.

[36]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[37]  N. Foo Conceptual Spaces—The Geometry of Thought , 2022 .

[38]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[39]  Ashwin Srinivasan,et al.  Carcinogenesis Predictions Using ILP , 1997, ILP.

[40]  Li Guo,et al.  Knowledge Base Completion Using Embeddings and Rules , 2015, IJCAI.

[41]  Nada Lavrac,et al.  Wordification: Propositionalization by unfolding relational data into bags of words , 2015, Expert Syst. Appl..

[42]  Nada Lavrac,et al.  HINMINE: heterogeneous information network mining with information retrieval heuristics , 2018, Journal of Intelligent Information Systems.

[43]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[44]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[45]  Saso Dzeroski,et al.  Learning Nonrecursive Definitions of Relations with LINUS , 1991, EWSL.

[46]  Heiko Paulheim,et al.  RDF2Vec: RDF Graph Embeddings for Data Mining , 2016, SEMWEB.

[47]  Oliver Schulte,et al.  The CTU Prague Relational Learning Repository , 2015, ArXiv.

[48]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[49]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[50]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[51]  Peter A. Flach,et al.  An extended transformation approach to inductive logic programming , 2001, ACM Trans. Comput. Log..

[52]  Michel Crampes,et al.  Survey on Social Community Detection , 2013, Social Media Retrieval.

[53]  Nicolas Lachiche,et al.  CARAF: Complex Aggregates within Random Forests , 2015, ILP.

[54]  Peter A. Flach,et al.  Confirmation-Guided Discovery of First-Order Rules with Tertius , 2004, Machine Learning.

[55]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[56]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[57]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[58]  Peter A. Flach,et al.  IBC: A First-Order Bayesian Classifier , 1999, ILP.

[59]  Nada Lavrac,et al.  Propositionalization-based relational subgroup discovery with RSD , 2006, Machine Learning.

[60]  Ashwin Srinivasan,et al.  Discrete Stochastic Search and Its Application to Feature-Selection for Deep Relational Machines , 2019, ICANN.

[61]  Zhendong Mao,et al.  Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[62]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[63]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[64]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[65]  Nada Lavrac,et al.  Deep Node Ranking: an Algorithm for Structural Network Embedding and End-to-End Classification , 2019, ArXiv.

[66]  Jure Leskovec,et al.  Predicting multicellular function through multi-layer tissue networks , 2017, Bioinform..

[67]  Zhen Wang,et al.  Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.

[68]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[69]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[70]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[71]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[72]  Nada Lavrac,et al.  SegMine workflows for semantic microarray data analysis in Orange4WS , 2011, BMC Bioinformatics.

[73]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[74]  Peter A. Flach,et al.  Propositionalization approaches to relational data mining , 2001 .

[75]  Filip Železný,et al.  HiFi: Tractable Propositionalization through Hierarchical Feature Construction , 2008 .

[76]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[77]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[78]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[79]  Marco Zaffalon,et al.  Statistical comparison of classifiers through Bayesian hierarchical modelling , 2016, Machine Learning.

[80]  Charu C. Aggarwal,et al.  Heterogeneous Network Embedding via Deep Architectures , 2015, KDD.

[81]  Jan Kralj,et al.  Deep Node Ranking: Structural Network Embedding and End-to-End Node Classification , 2019 .

[82]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[83]  Sebastijan Dumancic,et al.  Auto-encoding Logic Programs , 2018 .

[84]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[85]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[86]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[87]  Jason Weston,et al.  StarSpace: Embed All The Things! , 2017, AAAI.

[88]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[89]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[90]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[91]  Wannes Meert,et al.  Learning Relational Representations with Auto-encoding Logic Programs , 2019, IJCAI.

[92]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[93]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[94]  Stephen Muggleton,et al.  To the international computing community: A new East-West challenge , 1994 .

[95]  Ondrej Kuzelka,et al.  Block-wise construction of tree-like relational features with monotone reducibility and redundancy , 2011, Machine Learning.