Empirical analysis on a keyword-based semantic system

AbstractKeywords in scientific articles have found their significance in information filtering and classification. In this article, we empirically investigated statistical characteristics and evolutionary properties of keywords in a very famous journal, namely Proceedings of the National Academy of Science of the United States of America (PNAS), including frequency distribution, temporal scaling behavior, and decay factor. The empirical results indicate that the keyword frequency in PNAS approximately follows a Zipf’s law with exponent 0.86. In addition, there is a power-low correlation between the cumulative number of distinct keywords and the cumulative number of keyword occurrences. Extensive empirical analysis on some other journals’ data is also presented, with decaying trends of most popular keywords being monitored. Interestingly, top journals from various subjects share very similar decaying tendency, while the journals of low impact factors exhibit completely different behavior. Those empirical characters may shed some light on the in-depth understanding of semantic evolutionary behaviors. In addition, the analysis of keyword-based system is helpful for the design of corresponding recommender systems.

[1]  J. Rogers Chaos , 1876, Molecular Vibrations.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  F. Saussure,et al.  Course in General Linguistics , 1960 .

[4]  Elmer S. West From the U. S. A. , 1965 .

[5]  J. M. Oshorn Proc. Nat. Acad. Sei , 1978 .

[6]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[7]  Kenneth T. Rosen,et al.  The Size Distribution of Cities: An Examination of the Pareto Law and Primacy , 1980 .

[8]  Gerald Salton,et al.  Automatic text processing , 1988 .

[9]  P. Bak,et al.  Earthquakes as a self‐organized critical phenomenon , 1989 .

[10]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[11]  W. Bruce Croft,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[12]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[13]  R. Mantegna,et al.  Zipf plots and the size distribution of firms , 1995 .

[14]  D. Vernon Inform , 1995, Encyclopedia of the UN Sustainable Development Goals.

[15]  S. Solomon,et al.  NEW EVIDENCE FOR THE POWER-LAW DISTRIBUTION OF WEALTH , 1997 .

[16]  Javed Mostafa,et al.  A multilevel approach to intelligent information filtering: model, system, and evaluation , 1997, TOIS.

[17]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[18]  Luc Steels,et al.  Collective Learning and Semiotic Dynamics , 1999, ECAL.

[19]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.

[20]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[22]  R. Axtell Zipf Distribution of U.S. Firm Sizes , 2001, Science.

[23]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  P. Niyogi,et al.  Computational and evolutionary aspects of language , 2002, Nature.

[25]  A. ADoefaa,et al.  ? ? ? ? f ? ? ? ? ? , 2003 .

[26]  Albert-László Barabási,et al.  Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Heiner Stuckenschmidt,et al.  Handbook on Ontologies , 2004, Künstliche Intell..

[28]  Steffen Staab,et al.  International Handbooks on Information Systems , 2013 .

[29]  Power law distribution of wealth in population based on a modified Equíluz-Zimmermann model. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Stephan Roser,et al.  Ontology-Based Model Transformation , 2005, MoDELS Satellite Events.

[31]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[32]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[33]  Vittorio Loreto,et al.  Collaborative Tagging and Semiotic Dynamics , 2006, ArXiv.

[34]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[35]  Yanzhong Dang,et al.  Complex network properties of Chinese natural science basic research , 2006 .

[36]  Tao Zhou,et al.  Model and empirical study on some collaboration networks , 2006 .

[37]  Luc Steels,et al.  Semiotic Dynamics for Embodied Agents , 2006, IEEE Intelligent Systems.

[38]  Vittorio Loreto,et al.  Semiotic dynamics and collaborative tagging , 2006, Proceedings of the National Academy of Sciences.

[39]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[40]  Erez Lieberman,et al.  Quantifying the evolutionary dynamics of language , 2007, Nature.

[41]  Tao Li,et al.  Recommendation model based on opinion diffusion , 2007, ArXiv.

[42]  Yi-Cheng Zhang,et al.  Bipartite network projection and personal recommendation. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  Fang Wu,et al.  Novelty and collective attention , 2007, Proceedings of the National Academy of Sciences.

[44]  Zhao-Guo Xuan,et al.  Weighted network properties of Chinese nature science basic research , 2007 .

[45]  Tao Zhou,et al.  Empirical study on clique-degree distribution of networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Tao Zhou,et al.  MODELLING COLLABORATION NETWORKS BASED ON NONLINEAR PREFERENTIAL ATTACHMENT , 2007 .

[47]  Yi-Cheng Zhang,et al.  Effect of initial configuration on network-based recommendation , 2007, 0711.2506.

[48]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.