Link Prediction in Linked Data of Interspecies Interactions Using Hybrid Recommendation Approach

Linked Open Data for ACademia (LODAC) together with National Museum of Nature and Science have started collecting linked data of interspecies interaction and making link prediction for future observations. The initial data is very sparse and disconnected, making it very difficult to predict potential missing links using only one prediction model alone. In this paper, we introduce Link Prediction in Interspecies Interaction network (LPII) to solve this problem using hybrid recommendation approach. Our prediction model is a combination of three scoring functions, and takes into account collaborative filtering, community structure, and biological classification. We have found our approach, LPII, to be more accurate than other combinations of scoring functions. Using significance testing, we confirm that these three scoring functions are significant for LPII and they play different roles depending on the conditions of linked data. This shows that LPII can be applied to deal with other real-world situations of link prediction.

[1]  Daehoon Kim,et al.  Link prediction based on generalized cluster information , 2014, WWW.

[2]  Ronald Rousseau,et al.  Similarity measures in scientometric research: The Jaccard index versus Salton's cosine formula , 1989, Inf. Process. Manag..

[3]  Cheng-Lung Huang,et al.  Collaborative and Content-based Recommender System for Social Bookmarking Website , 2010 .

[4]  Hsinchun Chen,et al.  Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[5]  Nuanwan Soonthornphisaj,et al.  Hybrid Recommendation: Combining Content-Based Prediction and Collaborative Filtering , 2003, IDEAL.

[6]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[7]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[8]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[9]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[10]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[11]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[12]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[13]  C. Bell,et al.  Cophylogeny and biogeography of the fungal parasite Cyttaria and its host Nothofagus, southern beech , 2010, Mycologia.

[14]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[15]  Ke Xu,et al.  Link prediction in complex networks: a clustering perspective , 2011, The European Physical Journal B.

[16]  John F. Roddick,et al.  A Unifying Semantic Distance Model for Determining the Similarity of Attribute Values , 2003, ACSC.

[17]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[18]  Hideaki Takeda,et al.  Towards a Data Hub for Biodiversity with LOD , 2012, JIST.

[19]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..