Dimensionality Reduction for Supervised Learning in Link Prediction Problems

In recent years, a considerable amount of attention has been devoted to research on complex networks and their properties. Collaborative environments, social networks and recommender systems are popular examples of complex networks that emerged recently and are object of interest in academy and industry. Many studies model complex networks as graphs and tackle the link prediction problem, one major open question in network evolution. It consists in predicting the likelihood of an association between two not interconnected nodes in a graph to appear. One of the approaches to such problem is based on binary classification supervised learning. Although the curse of dimensionality is a historical obstacle in machine learning, little effort has been applied to deal with it in the link prediction scenario. So, this paper evaluates the effects of dimensionality reduction as a preprocessing stage to the binary classifier construction in link prediction applications. Two dimensionality reduction strategies are experimented: Principal Component Analysis (PCA) and Forward Feature Selection (FFS). The results of experiments with three different datasets and four traditional machine learning algorithms show that dimensionality reduction with PCA and FFS can improve model precision in this kind of problem.

[1]  A. Vespignani,et al.  The architecture of complex weighted networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[3]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[4]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Ke Hu,et al.  Robustness of Link-prediction Algorithm Based on Similarity and Application to Biological Networks , 2013, ArXiv.

[8]  Ricardo B. C. Prud Supervised Link Prediction in Weighted Networks , 2011 .

[9]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008 .

[10]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[11]  Kristina Lerman,et al.  Network flows and the link prediction problem , 2013, SNAKDD '13.

[12]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[13]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[14]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[15]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[16]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[17]  Sheena Mathew,et al.  Link Prediction in Protein Networks , 2011 .

[18]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[19]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[20]  Rushed Kanawati,et al.  Tag Recommendation by Link Prediction Based on Supervised Machine Learning , 2012, ICWSM.

[21]  Hsinchun Chen,et al.  Recommendation as link prediction: a graph kernel-based machine learning approach , 2009, JCDL '09.

[22]  Ye Xu,et al.  Feature selection for link prediction , 2012, PIKM '12.

[23]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[24]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[25]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[26]  Alexis Papadimitriou,et al.  Fast and accurate link prediction in social networking systems , 2012, J. Syst. Softw..

[27]  Hsinchun Chen,et al.  Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[28]  Mikko Kivelä,et al.  Generalizations of the clustering coefficient to weighted complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Marta C. González,et al.  Cycles and clustering in bipartite networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Ali Shojaie,et al.  Link Prediction in Biological Networks using Multi-Mode Exponential Random Graph Models , 2013 .

[31]  Hisashi Kashima,et al.  Cross-Temporal Link Prediction , 2011, 2011 IEEE 11th International Conference on Data Mining.

[32]  Panagiotis Symeonidis,et al.  From biological to social networks: Link prediction based on multi-way spectral clustering , 2013, Data Knowl. Eng..

[33]  Linyuan Lü,et al.  Similarity index based on local paths for link prediction of complex networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Céline Rouveirol,et al.  Supervised Machine Learning Applied to Link Prediction in Bipartite Social Networks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.