Unified mathematical treatment of complex cascaded bipartite networks: the case of collections of journal papers

A mathematical treatment is proposed for analysis of entities and relations among entities in complex networks consisting of cascaded bipartite networks. This treatment is applied to the case of collections of journal papers, in which entities are papers, references, paper authors, reference authors, paper journals, reference journals, institutions, terms, and term definitions. An entity-relationship model is introduced that explicitly shows direct links between entity-types and possible useful indirect relations. From this a matrix formulation and generalized matrix arithmetic are introduced that allow easy expression of relations between entities and calculation of weights of indirect links and co-occurrence links. Occurrence matrices, equivalence matrices, membership matrices and co-occurrence matrices are described. A dynamic model of growth describes recursive relations in occurrence and co-occurrence matrices as papers are added to the paper collection. Graph theoretic matrices are introduced to allow information flow studies of networks of papers linked by their citations. Similarity calculations and similarity fusion are explained. Derivation of feature vectors for pattern recognition techniques is presented. The relation of the proposed mathematical treatment to seriation, clustering, multidimensional scaling, and visualization techniques is discussed. It is shown that most existing bibliometric analysis techniques for dealing with collections of journal papers are easily expressed in terms of the proposed mathematical treatment: co-citation analysis, bibliographic coupling analysis, author co-citation analysis, journal co-citation analysis, Braam-Moed-vanRaan (BMV) co-citation/co-word analysis, latent semantic analysis, hubs and authorities, and multidimensional scaling. This report discusses an extensive software toolkit that was developed for this research for analyzing and visualizing entities and links in a collection of journal papers. Additionally, an extensive case study is presented, analyzing and visualizing 60 years of anthrax research. When dealing with complex networks that consist of cascaded bipartite networks, the treatment presented here provides a general mathematical framework for all aspects of analysis of static network structure and network dynamic growth. As such, it provides a basic paradigm for thinking about and modeling such networks: computing direct and indirect links, expressing and analyzing statistical distributions of network characteristics, describing network growth, deriving feature vectors, clustering, and visualizing network structure and growth.

[1]  James C. Brower,et al.  Sedation of an original data matrix as applied to paleoecology , 1988 .

[2]  Howard D. White,et al.  Author cocitation: A literature measure of intellectual structure , 1981, J. Am. Soc. Inf. Sci..

[3]  Katherine W. McCain,et al.  Visualizing a discipline: an author co-citation analysis of information science, 1972–1995 , 1998 .

[4]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[5]  Charles V. Packer,et al.  Applying row-column permutation to matrix representations of large citation networks , 1989, Inf. Process. Manag..

[6]  Morton L. Schagrin Resistance to Ohm's Law , 1963 .

[7]  Alan L. Porter,et al.  Automated extraction and visualization of information for technological intelligence and forecasting , 2002 .

[8]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[9]  Ronald Rousseau,et al.  Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient , 2003, J. Assoc. Inf. Sci. Technol..

[10]  H. Simon,et al.  ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[11]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[12]  Timothy Cribbin,et al.  Visualizing and tracking the growth of competing paradigms: Two case studies , 2002, J. Assoc. Inf. Sci. Technol..

[13]  Gary G Yen,et al.  Crossmaps: Visualization of overlapping relationships in collections of journal papers , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  T. Kuhn The structure of scientific revolutions, 3rd ed. , 1996 .

[15]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[16]  Diana Crane,et al.  An Exploratory Study of Kuhnian Paradigms in Theoretical High Energy Physics , 1980 .

[17]  Derek de Solla Price,et al.  A general theory of bibliometric and other cumulative advantage processes , 1976, J. Am. Soc. Inf. Sci..

[18]  Sinan Salman,et al.  DIVA: a visualization system for exploring document databases for technology forecasting , 2002 .

[19]  Robert L. Goldstone,et al.  The simultaneous evolution of author and paper networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  R. Bhatnagar,et al.  Anthrax Toxin , 2001, Critical reviews in microbiology.

[21]  Henk F. Moed,et al.  Mapping of science by combined co-citation and word analysis, I. Structural aspects , 1991, J. Am. Soc. Inf. Sci..

[22]  D J PRICE,et al.  NETWORKS OF SCIENTIFIC PAPERS. , 1965, Science.

[23]  W. S. Robinson A Method for Chronologically Ordering Archaeological Deposits , 1951, American Antiquity.

[24]  S. Naranan POWER LAW RELATIONS IN SCIENCE BIBLIOGRAPHY—A SELF‐CONSISTENT INTERPRETATION , 1971 .

[25]  P. Turnbull,et al.  Anthrax vaccines: past, present and future. , 1991, Vaccine.

[26]  S. Bradford "Sources of information on specific subjects" by S.C. Bradford , 1985 .

[27]  Olle Persson,et al.  The Intellectual Base and Research Fronts of JASIS 1986-1990 , 1994, J. Am. Soc. Inf. Sci..

[28]  Jean Pierre Courtial,et al.  Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry , 1991, Scientometrics.

[29]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[31]  Gary G. Yen,et al.  Time line visualization of research fronts , 2003, J. Assoc. Inf. Sci. Technol..

[32]  Kevin W. Boyack,et al.  Domain visualization using VxInsight® for science and technology management , 2002, J. Assoc. Inf. Sci. Technol..

[33]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[34]  Loet Leydesdorff,et al.  The Challenge of Scientometrics: The Development, Measurement, and Self-Organization of Scientific Communications , 2001 .

[35]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[36]  Chaomei Chen,et al.  Bridging the Gap: The Use of Pathfinder Networks in Visual Navigation , 1998, J. Vis. Lang. Comput..

[37]  Steven A. Morris,et al.  Manifestation of emerging specialties in journal literature: A growth model of papers, references, exemplars, bibliographic coupling, cocitation, and clustering coefficient distribution , 2005, J. Assoc. Inf. Sci. Technol..

[38]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[39]  Witold Pedrycz,et al.  Data Mining Methods for Knowledge Discovery , 1998, IEEE Trans. Neural Networks.

[40]  Quentin L. Burrell,et al.  Stochastic modelling of the first-citation distribution , 2004, Scientometrics.