Co-occurrence matrices and their applications in information science: Extending ACA to the Web environment

Co-occurrence matrices, such as cocitation, coword, and colink matrices, have been used widely in the information sciences. However, confusion and controversy have hindered the proper statistical analysis of these data. The underlying problem, in our opinion, involved understanding the nature of various types of matrices. This article discusses the difference between a symmetrical cocitation matrix and an asymmetrical citation matrix as well as the appropriate statistical techniques that can be applied to each of these matrices, respectively. Similarity measures (such as the Pearson correlation coefficient or the cosine) should not be applied to the symmetrical cocitation matrix but can be applied to the asymmetrical citation matrix to derive the proximity matrix. The argument is illustrated with examples. The study then extends the application of co-occurrence matrices to the Web environment, in which the nature of the available data and thus data collection methods are different from those of traditional databases such as the Science Citation Index. A set of data collected with the Google Scholar search engine is analyzed by using both the traditional methods of multivariate analysis and the new visualization software Pajek, which is based on social network analysis and graph theory. © 2006 Wiley Periodicals, Inc.

[1]  Katherine W. McCain,et al.  Visualizing a discipline: an author co-citation analysis of information science, 1972–1995 , 1998 .

[2]  Loet Leydesdorff,et al.  Similarity Measures, Author Cocitation Analysis, and Information Theory , 2005, J. Assoc. Inf. Sci. Technol..

[3]  John Scott What is social network analysis , 2010 .

[4]  Eugene Garfield,et al.  Citation indexing - its theory and application in science, technology, and humanities , 1979 .

[5]  Loet Leydesdroff Words and co-words as indicators of intellectual organization , 1989 .

[6]  Loet Leydesdorff,et al.  Internet time and the reliability of search engines , 2004, First Monday.

[7]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[8]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[9]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[10]  Henry G. Small,et al.  Clustering thescience citation index® using co-citations , 1985, Scientometrics.

[11]  Howard D. White,et al.  Author cocitation analysis and Pearson's r , 2003, J. Assoc. Inf. Sci. Technol..

[12]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[13]  Howard D. White Replies and a correction , 2004, J. Assoc. Inf. Sci. Technol..

[14]  Ronald Rousseau,et al.  Rejoinder: In defense of formal methods , 2004, J. Assoc. Inf. Sci. Technol..

[15]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[16]  L. Vaughan,et al.  Mapping business competitive positions using web co-link analysis , 2005 .

[17]  Howard D. White,et al.  Author cocitation: A literature measure of intellectual structure , 1981, J. Am. Soc. Inf. Sci..

[18]  Jae-On Kim,et al.  Factor Analysis: Statistical Methods and Practical Issues , 1978 .

[19]  Jeff White Readings in agents , 1998 .

[20]  Forrest W. Young,et al.  Introduction to Multidimensional Scaling: Theory, Methods, and Applications , 1981 .

[21]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[22]  M. J. Norušis,et al.  SPSS professional statistics 6.1 , 1994 .

[23]  Debora Shaw,et al.  Bibliographic and Web citations: What is the difference? , 2003, J. Assoc. Inf. Sci. Technol..

[24]  Stephen J. Bensman Pearson's r and author cocitation analysis: A commentary on the controversy , 2004, J. Assoc. Inf. Sci. Technol..

[25]  Ronald Rousseau,et al.  Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient , 2003, J. Assoc. Inf. Sci. Technol..

[26]  Loet Leydesdorff,et al.  Various methods for the mapping of science , 1987, Scientometrics.

[27]  Loet Leydesdorff Words and co-words as indicators of intellectual organization , 1989 .

[28]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[29]  G. Furnas,et al.  Pictures of relevance: a geometric analysis of similarity measures , 1987 .

[30]  Debora Shaw,et al.  Web citation data for impact assessment: A comparison of four science disciplines , 2005, J. Assoc. Inf. Sci. Technol..

[31]  R. Burt Toward a structural theory of action , 1982 .

[32]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[33]  Ronald Rousseau,et al.  Author cocitation analysis and Pearson's r , 2004, J. Assoc. Inf. Sci. Technol..