On Embeddings in Relational Databases

We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding. Low-dimensional embeddings aim to encapsulate a concise vector representation for an underlying dataset with minimum loss of information. Embeddings across entities in a relational database have been less explored due to the intricate data relations and representation complexity involved. Relational databases are an inter-weaved collection of relations that not only model relationships between entities but also record complex domain-specific quantitative and temporal attributes of data defining complex relationships among entities. Recent methods for learning an embedding constitute of a naive approach to consider complete denormalization of the database by materializing the full join of all tables and representing as a knowledge graph. This popular approach has certain limitations as it fails to capture the inter-row relationships and additional semantics encoded in the relational databases. In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships. Empirical results over a real-world database with evaluations on similarity join and table completion tasks support our proposition.

[1]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Oded Shmueli,et al.  Exploiting Latent Information in Relational Databases via Word Embedding and Application to Degrees of Disclosure , 2018, CIDR.

[4]  Oded Shmueli,et al.  Using Word Embedding to Enable Semantic Queries in Relational Databases , 2017, DEEM@SIGMOD.

[5]  Krisztian Balog,et al.  Auto-completion for Data Cells in Relational Tables , 2019, CIKM.

[6]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[7]  Surajit Chaudhuri,et al.  InfoGather: entity augmentation and attribute discovery by holistic matching with web tables , 2012, SIGMOD Conference.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Meihui Zhang,et al.  InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables , 2013, SIGMOD '13.

[11]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[12]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[13]  Krisztian Balog,et al.  EntiTables: Smart Assistance for Entity-Focused Tables , 2017, SIGIR.

[14]  Krisztian Balog,et al.  Ad Hoc Table Retrieval using Semantic Similarity , 2018, WWW.

[15]  Reynold Xin,et al.  Finding related tables , 2012, SIGMOD Conference.

[16]  Yeye He,et al.  Concept Expansion Using Web Tables , 2015, WWW.

[17]  Krisztian Balog,et al.  Recommending Related Tables , 2019, ArXiv.