A generative model for scientific concept hierarchies

In many scientific disciplines, each new ‘product’ of research (method, finding, artifact, etc.) is often built upon previous findings–leading to extension and branching of scientific concepts over time. We aim to understand the evolution of scientific concepts by placing them in phylogenetic hierarchies where scientific keyphrases from a large, longitudinal academic corpora are used as a proxy of scientific concepts. These hierarchies exhibit various important properties, including power-law degree distribution, power-law component size distribution, existence of a giant component and less probability of extending an older concept. We present a generative model based on preferential attachment to simulate the graphical and temporal properties of these hierarchies which helps us understand the underlying process behind scientific concept evolution and may be useful in simulating and predicting scientific evolution.

[1]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[2]  G. Yule,et al.  A Mathematical Theory of Evolution Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[3]  Srayan Datta,et al.  Building a Scientific Concept Hierarchy Database (SCHBase) , 2015, ACL.

[4]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[5]  Michael K. Buckland,et al.  Annual Review of Information Science and Technology , 2006, J. Documentation.

[6]  L. Amaral,et al.  The web of human sexual contacts , 2001, Nature.

[7]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[8]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[9]  Yue Chen,et al.  Towards an explanatory and computational theory of scientific discovery , 2009, J. Informetrics.

[10]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[11]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[12]  Dafna Shahaf,et al.  Metro maps of science , 2012, KDD.

[13]  B. Bollobás The evolution of random graphs , 1984 .

[14]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Beom Jun Kim,et al.  Growing scale-free networks with tunable clustering. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Pravin K. Trivedi,et al.  Essentials of Count Data Regression , 2007 .

[17]  Albert-László Barabási,et al.  Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[19]  Remco van der Hofstad,et al.  Diameters in Preferential Attachment Models , 2007, 0705.4153.

[20]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[21]  B. Cronin,et al.  The web of knowledge: a festschrift in honor of Eugene Garfield , 2000 .

[22]  Carlos Castillo-Chavez,et al.  Population modeling of the emergence and development of scientific fields , 2008, Scientometrics.

[23]  Santo Fortunato,et al.  Attention Decay in Science , 2015, J. Informetrics.

[24]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[25]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[26]  Eytan Adar,et al.  SaRAD: a Simple and Robust Abbreviation Dictionary , 2004, Bioinform..

[27]  Chavdar Dangalchev,et al.  Generation models for scale-free networks , 2004 .

[28]  M. Keary The Web of Knowledge: A Festschrift in Honor of Eugene Garfield , 2001 .

[29]  Diego Garlaschelli,et al.  Fitness-dependent topological properties of the world trade web. , 2004, Physical review letters.

[30]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[31]  Derek de Solla Price,et al.  A general theory of bibliometric and other cumulative advantage processes , 1976, J. Am. Soc. Inf. Sci..

[32]  Filippo Menczer,et al.  Evolution of document networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.