Improving Textual Network Embedding with Global Attention via Optimal Transport

Constituting highly informative network embeddings is an important tool for network analysis. It encodes network topology, along with other useful side information, into low-dimensional node-based feature representations that can be exploited by statistical modeling. This work focuses on learning context-aware network embeddings augmented with text data. We reformulate the network-embedding problem, and present two novel strategies to improve over traditional attention mechanisms: ($i$) a content-aware sparse attention module based on optimal transport, and ($ii$) a high-level attention parsing module. Our approach yields naturally sparse and self-normalized relational inference. It can capture long-term interactions between sequences, thus addressing the challenges faced by existing textual network embedding schemes. Extensive experiments are conducted to demonstrate our model can consistently outperform alternative state-of-the-art methods.

[1]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2]  Chris Dyer,et al.  Notes on Noise Contrastive Estimation and Negative Sampling , 2014, ArXiv.

[3]  Ion Androutsopoulos,et al.  Learning Textual Entailment using SVMs and String Similarity Measures , 2007, ACL-PASCAL@ACL.

[4]  Zhiyuan Liu,et al.  Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[5]  Hongyuan Zha,et al.  A Fast Proximal Point Method for Wasserstein Distance , 2018, ArXiv.

[6]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[7]  Christopher D. Manning,et al.  Natural language inference , 2009 .

[8]  Bowen Zhou,et al.  Attentive Pooling Networks , 2016, ArXiv.

[9]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[10]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[11]  Zhiyuan Liu,et al.  CANE: Context-Aware Network Embedding for Relation Modeling , 2017, ACL.

[12]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[13]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[14]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[15]  Leslie Hogben,et al.  Combinatorial Matrix Theory , 2013 .

[16]  Zhe Gan,et al.  Adversarial Text Generation via Feature-Mover's Distance , 2018, NeurIPS.

[17]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[18]  Yorick Wilks,et al.  Natural language inference. , 1973 .

[19]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[20]  Jian Pei,et al.  Community Preserving Network Embedding , 2017, AAAI.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Daniel Jurafsky,et al.  A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[23]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[24]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Zhe Gan,et al.  Improving Sequence-to-Sequence Learning via Optimal Transport , 2019, ICLR.

[27]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[28]  Xiaodong Liu,et al.  Stochastic Answer Networks for Natural Language Inference , 2018, ArXiv.

[29]  Guoyin Wang,et al.  Joint Embedding of Words and Labels for Text Classification , 2018, ACL.

[30]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[31]  Ramakanth Pasunuru,et al.  Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation , 2018, ACL.

[32]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[33]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[34]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[35]  Vlad Niculae,et al.  A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.

[36]  Tommi S. Jaakkola,et al.  Gromov-Wasserstein Alignment of Word Embedding Spaces , 2018, EMNLP.

[37]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[38]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[39]  Hady Wirawan Lauw,et al.  Probabilistic Latent Document Network Embedding , 2014, 2014 IEEE International Conference on Data Mining.

[40]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[41]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[42]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[43]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[44]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[45]  Xinyuan Zhang,et al.  Diffusion Maps for Textual Network Embedding , 2018, NeurIPS.

[46]  C. Villani Optimal Transport: Old and New , 2008 .

[47]  Dinghan Shen,et al.  Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment , 2018, EMNLP.

[48]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[49]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[50]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[51]  Jiang Guo,et al.  A General Framework for Content-enhanced Network Representation Learning , 2016, ArXiv.