RSDNE: Exploring Relaxed Similarity and Dissimilarity from Completely-Imbalanced Labels for Network Embedding

Network embedding, aiming to project a network into a lowdimensional space, is increasingly becoming a focus of network research. Semi-supervised network embedding takes advantage of labeled data, and has shown promising performance. However, existing semi-supervised methods would get unappealing results in the completely-imbalanced label setting where some classes have no labeled nodes at all. To alleviate this, we propose a novel semi-supervised network embedding method, termed Relaxed Similarity and Dissimilarity Network Embedding (RSDNE). Specifically, to benefit from the completely-imbalanced labels, RSDNE guarantees both intra-class similarity and inter-class dissimilarity in an approximate way. Experimental results on several real-world datasets demonstrate the superiority of the proposed method.

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Jingzhou Liu,et al.  Visualizing Large-scale and High-dimensional Data , 2016, WWW.

[3]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[4]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[5]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[6]  Pradeep Ravikumar,et al.  Collaborative Filtering with Graph Information: Consistency and Scalable Methods , 2015, NIPS.

[7]  Zhiyuan Liu,et al.  Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[8]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[9]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[10]  Gerard de Melo Inducing Conceptual Embedding Spaces from Wikipedia , 2017, WWW.

[11]  Yi Yang,et al.  A Framework of Online Learning with Imbalanced Streaming Data , 2017, AAAI.

[12]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[13]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[14]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[17]  Charu C. Aggarwal,et al.  Linked Document Embedding for Classification , 2016, CIKM.

[18]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[19]  Shiguang Shan,et al.  Multi-View Discriminant Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[21]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[22]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[23]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[24]  Mikhail Belkin,et al.  Convergence of Laplacian Eigenmaps , 2006, NIPS.

[25]  Dahua Lin,et al.  Inter-modality Face Recognition , 2006, ECCV.

[26]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[27]  Ludovic Denoyer,et al.  Learning latent representations of nodes for classifying in heterogeneous social networks , 2014, WSDM.

[28]  Yue-Shi Lee,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009, Expert Syst. Appl..

[29]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[30]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.