Codd's World: Topics and their Evolution in the Database Community Publication Graph

Scholarly network analysis is the study of a scienti c research network aiming to discover meaningful insights and making datadriven research decisions. Analyzing such networks has become increasingly challenging, due to the amount of scienti c research that is added every day. Furthermore, online resources often include information from other online sources (e.g., academic social platforms), enabling to study networks on a larger and more complex scope. In this paper, we present a study on a speci c research network: The (relational) database community publication graph, that we call Codd’s World; a transitive closure over citations from the foundational work of E.F. Codd. We speci cally analyze the topics of the published papers, the relevance of authors and papers, and how this relates to raw publication counts. Among our ndings, we show that topic modeling can be a useful entry point for scholarly network analysis.

[1]  Yuxiao Dong,et al.  A Century of Science: Globalization of Scientific Collaborations, Citations, and Innovations , 2017, KDD.

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  Ying Ding,et al.  Scholarly Networks Analysis , 2014, Encyclopedia of Social Network Analysis and Mining.

[4]  Ciprian-Octavian Truica,et al.  Topic modeling and hypergraph mining to analyze the EGC conference history , 2016, EGC.

[5]  Carl T. Bergstrom Eigenfactor Measuring the value and prestige of scholarly journals , 2007 .

[6]  Qin He,et al.  Knowledge Discovery Through Co-Word Analysis , 1999, Libr. Trends.

[7]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[8]  Andreas Thor,et al.  Citation analysis of database publications , 2005, SGMD.

[9]  Kevin W. Boyack,et al.  Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? , 2010 .

[10]  Erhard Rahm,et al.  Affiliation analysis of database publications , 2011, SGMD.

[11]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[12]  Michael W. Godfrey,et al.  What's hot and what's not: Windowed developer topic analysis , 2009, 2009 IEEE International Conference on Software Maintenance.

[13]  Feng Xia,et al.  Mining advisor-advisee relationships in scholarly big data: A deep learning approach , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[14]  Christin Katharina Kreutz,et al.  A Hybrid Approach for Dynamic Topic Models with Fluctuating Number of Topics , 2018, Grundlagen von Datenbanken.

[15]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[16]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[17]  Ying Ding,et al.  Scholarly network similarities: How bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other , 2012, J. Assoc. Inf. Sci. Technol..

[18]  Michael W. Berry,et al.  Email Surveillance Using Non-negative Matrix Factorization , 2005, Comput. Math. Organ. Theory.

[19]  Adrien Guille,et al.  TOM: A library for topic modeling and browsing , 2016, EGC.

[20]  Satrio Baskoro Yudhoatmojo,et al.  Community Detection On Citation Network Of DBLP Data Sample Set Using LinkRank Algorithm , 2017 .

[21]  Mao Lin Huang,et al.  Analysis and Visualization of Co-authorship Networks for Understanding Academic Collaboration and Knowledge Domain of Individual Researchers , 2006, International Conference on Computer Graphics, Imaging and Visualisation (CGIV'06).

[22]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[23]  Sergey I. Nikolenko,et al.  Topic modelling for qualitative studies , 2017, J. Inf. Sci..

[24]  Feng Xia,et al.  Big Scholarly Data: A Survey , 2017, IEEE Transactions on Big Data.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..