Predicting Future Links Between Disjoint Research Areas Using Heterogeneous Bibliographic Information Network

Literature-based discovery aims to discover hidden connections between previously disconnected research areas. Heterogeneous bibliographic information network (HBIN) provides a latent, semi-structured, bibliographic information model to signal the potential connections between scientific papers. This paper introduces a novel literature-based discovery method that builds meta path features from HBIN network to predict co-citation links between previously disconnected literatures. We evaluated the performance of our method in predicting future co-citation links between fish oil and Raynaud’s syndrome papers. Our experimental results showed that HBIN meta path features could predict future co-citation links between these papers with high accuracy (0.851 F-Measure; 0.845 precision; 0.857 recall), outperforming the existing document similarity algorithms such as LDA, TF-IDF, and Bibliographic Coupling.

[1]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[2]  Amit P. Sheth,et al.  A graph-based recovery and decomposition of Swanson's hypothesis using semantic predications , 2013, J. Biomed. Informatics.

[3]  Chih-Ping Wei,et al.  Mining Biomedical Literature and Ontologies for Drug Repositioning Discovery , 2014, PAKDD.

[4]  Jiawei Han,et al.  ClusCite: effective citation recommendation by information network-based clustering , 2014, KDD.

[5]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[6]  Chabane Djeraba,et al.  What are the grand challenges for data mining?: KDD-2006 panel report , 2006, SKDD.

[7]  Michel Zitt,et al.  Patents and Publications , 2004 .

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[10]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[11]  Yizhou Sun,et al.  Full-text based context-rich heterogeneous network mining approach for citation recommendation , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[12]  Wanda Pratt,et al.  Using statistical and knowledge-based approaches for literature-based discovery , 2006, J. Biomed. Informatics.

[13]  Carol Friedman,et al.  Literature-Based Knowledge Discovery using Natural Language Processing , 2008 .

[14]  Neil R. Smalheiser,et al.  Literature-based discovery: Beyond the ABCs , 2012, J. Assoc. Inf. Sci. Technol..

[15]  Ronald N. Kostoff,et al.  Literature-related discovery , 2009, Annu. Rev. Inf. Sci. Technol..

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[18]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[19]  Feng Zhou,et al.  Enhancing the accuracy of knowledge discovery: a supervised learning method , 2014, BMC Bioinformatics.