Tracing database usage: Detecting main paths in database link networks

This paper presents a database link network to measure the impact of databases on biological research. To this end, we used the 20,861 full-text articles from PubMed Central in the field of Bioinformatics. We then extracted databases from the methodology sections of these articles and their references. The list of databases was built with The 2013 Nucleic Acids Research Molecular Biology Database Collection (available online), which includes 1512 databases. The database link network was constructed from sets of pairs of databases mentioned in the methodology sections of full-text PubMed Central articles. The edges of the database link network represent the link relationships between two databases. The weight of each edge is determined either by the link frequency of the two databases (i.e., in the link-weighted database link network) or the topic similarity between two databases (i.e., in the similarity-weighted database link network). With the database link network, we analyzed the topological structure and main paths of the database link network to trace the usage, connection, and evolution of databases. We also conducted content analysis by comparing content similarities among the papers citing databases.

[1]  Min Song,et al.  Entitymetrics: Measuring the Impact of Entities , 2013, PloS one.

[2]  Yong Liu,et al.  A co-word analysis of library and information science in China , 2013, Scientometrics.

[3]  Rob W.W. Hooft,et al.  The value of data , 2011, Nature Genetics.

[4]  James Caverlee,et al.  PageRank for ranking authors in co-citation networks , 2009 .

[5]  Michael Y. Galperin,et al.  The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection , 2011, Nucleic Acids Res..

[6]  Sungsam Gong,et al.  MetaBase—the wiki-database of biological databases , 2011, Nucleic Acids Res..

[7]  Ruoming Jin,et al.  A Topic Modeling Approach and Its Integration into the Random Walk Framework for Academic Search , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[8]  C. Urquhart,et al.  A bibliometric approach demonstrates the impact of a social care data set on research and policy. , 2013, Health information and libraries journal.

[9]  Diana Lucio-Arias,et al.  Main-path analysis and path-dependent transitions in HistCite™-based historiograms , 2008 .

[10]  Yizhou Sun,et al.  Meta-Path-Based Search and Mining in Heterogeneous Information Networks , 2013 .

[11]  Mike Thelwall,et al.  Measuring the web impact of digitised scholarly resources , 2012, J. Documentation.

[12]  K. Pettigrew,et al.  The Use of Theory in Information Science Research. , 2001 .

[13]  Lynne McKechnie,et al.  The use of theory in information science research , 2001, J. Assoc. Inf. Sci. Technol..

[14]  John S. Liu,et al.  An innovative approach to identify the knowledge diffusion path: the case of resource-based theory , 2012, Scientometrics.

[15]  Tony Reichhardt,et al.  NASA reworks its sums after Mars fiasco , 1999, Nature.

[16]  Thed N. van Leeuwen,et al.  Seed journal citation network maps: A method based on network theory , 2012, J. Assoc. Inf. Sci. Technol..

[17]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[18]  Michelle D. Brazas,et al.  The 2011 bioinformatics links directory update: more resources, tools and databases and features to empower the bioinformatics community , 2011, Nucleic Acids Res..

[19]  Elizabeth S. Vieira,et al.  A research impact indicator for institutions , 2010, J. Informetrics.

[20]  D. Hawley,et al.  The use of theory in family therapy research: a content analysis of family therapy journals. , 2000, Journal of marital and family therapy.

[21]  Michael Y. Galperin,et al.  The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection , 2011, Nucleic Acids Res..

[22]  LiuYong,et al.  A co-word analysis of library and information science in China , 2013 .

[23]  Li Li,et al.  Latent co-interests' relationship prediction , 2013 .

[24]  José Antonio de la Peña,et al.  Impact functions on the citation network of scientific articles , 2011, J. Informetrics.

[25]  Jiang Tang,et al.  Citation characterization and impact normalization in bioinformatics journals , 2012, J. Assoc. Inf. Sci. Technol..

[26]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[27]  Ying Ding,et al.  Scientific collaboration and endorsement: Network analysis of coauthorship and citation networks , 2011, J. Informetrics.

[28]  Norman P. Hummon,et al.  Connectivity in a citation network: The development of DNA theory☆ , 1989 .

[29]  N M Luscombe,et al.  What is Bioinformatics? A Proposed Definition and Overview of the Field , 2001, Methods of Information in Medicine.

[30]  T Reichhardt,et al.  It's sink or swim as a tidal wave of data approaches , 1999, Nature.

[31]  L. Bornmann,et al.  Macro-Indicators of Citation Impacts of Six Prolific Countries: InCites Data and the Statistical Significance of Trends , 2013, PloS one.

[32]  Kathleen M. Carley,et al.  Scientific Influence , 1993 .