Link prediction in citation networks

In this article, we build models to predict the existence of citations among papers by formulating link prediction for 5 large-scale datasets of citation networks. The supervised machine-learning model is applied with 11 features. As a result, our learner performs very well, with the F1 values of between 0.74 and 0.82. Three features in particular, link-based Jaccard coefficient, difference in betweenness centrality, and cosine similarity of term frequency-inverse document frequency vectors, largely affect the predictions of citations. The results also indicate that different models are required for different types of research areas--research fields with a single issue or research fields with multiple issues. In the case of research fields with multiple issues, there are barriers among research fields because our results indicate that papers tend to be cited in each research field locally. Therefore, one must consider the typology of targeted research areas when building models for link prediction in citation networks.

[1]  Kevin W. Boyack,et al.  Toward a consensus map of science , 2009, J. Assoc. Inf. Sci. Technol..

[2]  Kevin W. Boyack,et al.  Domain visualization using VxInsight® for science and technology management , 2002, J. Assoc. Inf. Sci. Technol..

[3]  Y. Kajikawa,et al.  Citation network analysis of organic LEDs , 2009 .

[4]  Kevin W. Boyack,et al.  Mapping the backbone of science , 2004, Scientometrics.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Per Ottar Seglen,et al.  Causal relationship between article citedness and journal impact , 1994 .

[7]  Bruce Hendrickson,et al.  Knowledge Mining With VxInsight: Discovery Through Interaction , 1998, Journal of Intelligent Information Systems.

[8]  David M. Pennock,et al.  Statistical relational learning for document mining , 2003, Third IEEE International Conference on Data Mining.

[9]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[10]  Naoki Shibata,et al.  Topological analysis of citation networks to discover the future core articles , 2007, J. Assoc. Inf. Sci. Technol..

[11]  Y. Kajikawa,et al.  Opportunity discovery by assessing the gap between science and technology case study of secondary batteries , 2010, 2010 IEEE International Conference on Management of Innovation & Technology.

[12]  Linyuan Lü,et al.  Similarity index based on local paths for link prediction of complex networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Peter Vinkler Dynamic changes in the chance for citedness , 2004, Scientometrics.

[14]  Ichiro Sakata,et al.  Academic landscape of innovation research and national innovation system policy reformation in Japan and the United States , 2009, PICMET '09 - 2009 Portland International Conference on Management of Engineering & Technology.

[15]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[16]  Tsuyoshi Murata,et al.  Link Prediction based on Structural Properties of Online Social Networks , 2008, New Generation Computing.

[17]  Yoshiyuki Takeda,et al.  Tracking emerging technologies in energy research : toward a roadmap for sustainable energy , 2008 .

[18]  Yuya Kajikawa,et al.  Filling the gap between researchers studying different materials and different methods: a proposal for structured keywords , 2006, J. Inf. Sci..

[19]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[21]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Timothy Cribbin,et al.  Visualizing and tracking the growth of competing paradigms: Two case studies , 2002, J. Assoc. Inf. Sci. Technol..

[23]  Lyle H. Ungar,et al.  Statistical Relational Learning for Link Prediction , 2003 .

[24]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[25]  Ichiro Sakata,et al.  Detecting potential technological fronts by comparing scientific papers and patents , 2011 .

[26]  Chih-Ping Wei,et al.  Coauthorship networks and academic literature recommendation , 2010, Electron. Commer. Res. Appl..

[27]  Chaomei Chen,et al.  Visualizing knowledge domains , 2005, Annu. Rev. Inf. Sci. Technol..

[28]  Per O. Seglen Casual Relationship between Article Citedness and Journal Impact , 1994, J. Am. Soc. Inf. Sci..

[29]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[30]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[31]  Ichiro Sakata,et al.  Academic Landscape Of Innovation Research And National Innovation System Policy Reformation In Japan And The United States , 2012 .

[32]  Brij Mohan Gupta,et al.  Networks of scientific papers: A comparative analysis of co-citation, bibliographic coupling and direct citation , 1977 .

[33]  Yoshiyuki Takeda,et al.  Nanobiotechnology as an emerging research domain from nanotechnology: A bibliometric approach , 2009, Scientometrics.

[34]  A. Barab,et al.  Evolution of the social network of scienti $ c collaborations , 2002 .

[35]  Henry Small Visualizing science by citation mapping , 1999 .

[36]  Doina Caragea,et al.  Ontology-Based Link Prediction in the LiveJournal Social Network , 2009, SARA.

[37]  Yoshiyuki Takeda,et al.  Detecting emerging research fronts based on topological measures in citation networks of scientific publications , 2008 .

[38]  Michael H. MacRoberts,et al.  Problems of citation analysis: A critical review , 1989, JASIS.

[39]  Chaomei Chen,et al.  Visualising Semantic Spaces and Author Co-Citation Networks in Digital Libraries , 1999, Inf. Process. Manag..

[40]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[41]  Henry G. Small,et al.  Paradigms, citations, and maps of science: A personal history , 2003, J. Assoc. Inf. Sci. Technol..

[42]  Yuya Kajikawa,et al.  Topological analysis of citation networks to discover the future core articles: Research Articles , 2007 .