Link prediction in graph construction for supervised and semi-supervised learning

Many real-world domains are relational in nature since they consist of a set of objects related to each other in complex ways. However, there are also flat data sets and if we want to apply graph-based algorithms, it is necessary to construct a graph from this data. This paper aims to: i) increase the exploration of graph-based algorithms and ii) proposes new techniques for graph construction from flat data. Our proposal focuses on constructing graphs using link prediction measures for predicting the existence of links between entities from an initial graph. Starting from a basic graph structure such as a minimum spanning tree, we apply a link prediction measure to add new edges in the graph. The link prediction measures considered here are based on structural similarity of the graph that improves the graph connectivity. We evaluate our proposal for graph construction in supervised and semi-supervised classification and we confirm the graphs achieve better accuracy.

[1]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[2]  Ulrike von Luxburg,et al.  Influence of graph construction on graph-based clustering measures , 2008, NIPS.

[3]  Alneu de Andrade Lopes,et al.  Link Prediction in Complex Networks Based on Cluster Information , 2012, SBIA.

[4]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[5]  Jie Tang,et al.  Link Prediction of Social Networks Based on Weighted Proximity Measures , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[6]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[7]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[8]  Hamid R. Rabiee,et al.  Supervised neighborhood graph construction for semi-supervised classification , 2012, Pattern Recognit..

[9]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[10]  Springer-Verlag Wien,et al.  Exploiting behaviors of communities of twitter users for link prediction , 2013 .

[11]  Ben Taskar,et al.  Probabilistic Models of Text and Link Structure for Hypertext Classification , 2001 .

[12]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[13]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[14]  Tao Zhou,et al.  Link prediction in weighted networks: The role of weak ties , 2010 .

[15]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[16]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[17]  Maria Cristina Ferreira de Oliveira,et al.  Music Genre Classification Using Traditional and Relational Approaches , 2014, 2014 Brazilian Conference on Intelligent Systems.

[18]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[19]  Yoshihiro Yamanishi,et al.  GENIES: gene network inference engine based on supervised analysis , 2012, Nucleic Acids Res..

[20]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[21]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[22]  Yiming Yang,et al.  Hypertext Categorization using Hyperlink Patterns and Meta Data , 2001, ICML.

[23]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.

[24]  H. White,et al.  “Structural Equivalence of Individuals in Social Networks” , 2022, The SAGE Encyclopedia of Research Design.

[25]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[26]  Alneu de Andrade Lopes,et al.  Link Prediction in Online Social Networks Using Group Information , 2014, ICCSA.

[27]  Jean Tague-Sutcliffe,et al.  An Introduction to Informetrics , 1992, Inf. Process. Manag..

[28]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[29]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[30]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[31]  William M. Campbell,et al.  Link prediction methods for generating speaker content graphs , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[33]  Alneu de Andrade Lopes,et al.  Graph Construction Based on Labeled Instances for Semi-supervised Learning , 2014, 2014 22nd International Conference on Pattern Recognition.

[34]  Yuji Matsumoto,et al.  Using the Mutual k-Nearest Neighbor Graphs for Semi-supervised Classification on Natural Language Data , 2011, CoNLL.

[35]  Rami Puzis,et al.  Link Prediction in Social Networks Using Computationally Efficient Topological Features , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.