The relation between Pearson's correlation coefficient r and Salton's cosine measure

The relation between Pearson's correlation coefficient and Salton's cosine measure is revealed based on the different possible values of the division of the L1-norm and the L2-norm of a vector. These different values yield a sheaf of increasingly straight lines which together form a cloud of points, being the investigated relation. The theoretical results are tested against the author co-citation relations among 24 informetricians for whom two matrices can be constructed, based on co-citations: the asymmetric occurrence matrix and the symmetric co-citation matrix. Both examples completely confirm the theoretical results. The results enable us to specify an algorithm that provides a threshold value for the cosine above which none of the corresponding Pearson correlations would be negative. Using this threshold value can be expected to optimize the visualization of the vector space. © 2009 Wiley Periodicals, Inc.

[1]  Loet Leydesdorff,et al.  Co-occurrence matrices and their applications in information science: Extending ACA to the Web environment , 2006 .

[2]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[3]  Leo Egghe,et al.  Introduction to Informetrics: Quantitative Methods in Library, Documentation and Information Science , 1990 .

[4]  Loet Leydesdorff Visualization of the citation impact environments of scientific journals: An online mapping exercise , 2007 .

[5]  Loet Leydesdorff,et al.  Should co-occurrence data be normalized? A rejoinder , 2007, J. Assoc. Inf. Sci. Technol..

[6]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[7]  Tove Faber Frandsen,et al.  Journal diffusion factors - a measure of diffusion? , 2004, Aslib Proc..

[8]  P. Jaccard Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines , 1901 .

[9]  Jean Tague-Sutcliffe,et al.  Measuring information : an information services perspective , 1995 .

[10]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[11]  Loet Leydesdorff,et al.  The development of frames of references , 1986, Scientometrics.

[12]  L. Egghe New relations between similarity measures for vectors based on vector norms , 2009 .

[13]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[14]  G. Furnas,et al.  Pictures of relevance: a geometric analysis of similarity measures , 1987 .

[15]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics , 1998 .

[16]  Jean Tague-Sutcliffe,et al.  An Introduction to Informetrics , 1992, Inf. Process. Manag..

[17]  Loet Leydesdorff,et al.  On the normalization and visualization of author co-citation data: Salton's Cosine versus the Jaccard index , 2008 .

[18]  Donald H. Kraft,et al.  Measurement in Information Science , 1994 .

[19]  Loet Leydesdorff,et al.  The delineation of specialties in terms of journals using the dynamic journal set of the SCI , 2005, Scientometrics.

[20]  Leo Egghe,et al.  Elementary Statistics for Effective Library and Information Service Management , 2001 .

[21]  Loet Leydesdorff,et al.  Co-words and citations relations between document sets and environments , 1988 .

[22]  Stephen J. Bensman Pearson's r and author cocitation analysis: A commentary on the controversy , 2004, J. Assoc. Inf. Sci. Technol..

[23]  Leo Egghe,et al.  Strong similarity measures for ordered sets of documents in information retrieval , 2002, Inf. Process. Manag..

[24]  Ulrik Brandes,et al.  Eigensolver Methods for Progressive Multidimensional Scaling of Large Data , 2006, GD.

[25]  Howard D. White,et al.  Author cocitation analysis and Pearson's r , 2003, J. Assoc. Inf. Sci. Technol..

[26]  Ronald Rousseau,et al.  Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient , 2003, J. Assoc. Inf. Sci. Technol..

[27]  Leo Egghe,et al.  Construction of weak and strong similarity measures for ordered sets of documents using fuzzy set techniques , 2003, Inf. Process. Manag..

[28]  Loet Leydesdorff,et al.  Measuring the meaning of words in contexts: An automated analysis of controversies about 'Monarch butterflies,' 'Frankenfoods,' and 'stem cells' , 2006, Scientometrics.

[29]  Ludo Waltman,et al.  Some comments on the question whether co-occurrence data should be normalized , 2007 .