Classification and powerlaws: The logarithmic transformation

Logarithmic transformation of the data has been recommended by the literature in the case of highly skewed distributions such as those commonly found in information science. The purpose of the transformation is to make the data conform to the lognormal law of error for inferential purposes. How does this transformation affect the analysis? We factor analyze and visualize the citation environment of the Journal of the American Chemical Society (JACS) before and after a logarithmic transformation. The transformation strongly reduces the variance necessary for classificatory purposes and therefore is counterproductive to the purposes of the descriptive statistics. We recommend against the logarithmic transformation when sets cannot be defined unambiguously. The intellectual organization of the sciences is reflected in the curvilinear parts of the citation distributions while negative powerlaws fit excellently to the tails of the distributions.

[1]  J. S. Katz,et al.  The self-similar science system , 1999 .

[2]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[3]  Loet Leydesdorff,et al.  Indicators of structural change in the dynamics of science: Entropy statistics of the SCI Journal Citation Reports , 2009, Scientometrics.

[4]  Loet Leydesdorff,et al.  Co-occurrence matrices and their applications in information science: Extending ACA to the Web environment , 2006 .

[5]  R. Merton The Matthew Effect in Science , 1968, Science.

[6]  J. S. Katz,et al.  Scale-independent indicators and research evaluation , 2000 .

[7]  LeoEGGHE,et al.  Size—frequency and rank—frequency relations,power laws and exponentials: a unified approach , 2003 .

[8]  M. H. Hoyle,et al.  Transformations: An Introduction and a Bibliography , 1973 .

[9]  Michel Zitt,et al.  Shadows of the Past in International Cooperation: Collaboration Profiles of the Top Five Producers of Science , 2000, Scientometrics.

[10]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[11]  D. Goodin The cambridge dictionary of statistics , 1999 .

[12]  Loet Leydesdorff,et al.  Can scientific journals be classified in terms of aggregated journal-journal citation relations using the Journal Citation Reports? , 2009, J. Assoc. Inf. Sci. Technol..

[13]  J. M. Elliott,et al.  Some methods for the statistical analysis of samples of benthic invertebrates , 1971 .

[14]  Jae-On Kim,et al.  Factor Analysis: Statistical Methods and Practical Issues , 1978 .

[15]  Loet Leydesdorff,et al.  Network Structure, Self-Organization and the Growth of International Collaboration in Science.Research Policy, 34(10), 2005, 1608-1618. , 2005, 0911.4299.

[16]  Joao Antonio Pereira,et al.  Linked: The new science of networks , 2002 .

[17]  George W. Furnas,et al.  Pictures of relevance: A geometric analysis of similarity measures , 1987, J. Am. Soc. Inf. Sci..

[18]  A. Goldberger,et al.  Estimation of a Model with Multiple Indicators and Multiple Causes of a Single Latent Variable , 1975 .

[19]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[20]  B. Michelet L' analyse des associations , 1988 .

[21]  Howard D. White,et al.  Author cocitation analysis and Pearson's r , 2003, J. Assoc. Inf. Sci. Technol..

[22]  W. Feller On a General Class of "Contagious" Distributions , 1943 .

[23]  Derek de Solla Price,et al.  Cumulative advantage urn games explained: A reply to kantor , 1978, J. Am. Soc. Inf. Sci..

[24]  Loet Leydesdorff,et al.  Why Words and Co-Words Cannot Map the Development of the Sciences , 1997, J. Am. Soc. Inf. Sci..

[25]  Loet Leydesdorff,et al.  Various methods for the mapping of science , 1987, Scientometrics.

[26]  Loet Leydesdorff Words and co-words as indicators of intellectual organization , 1989 .

[27]  B. C. Brookes THEORY OF THE BRADFORD LAW , 1977 .

[28]  Loet Leydesdorff,et al.  The delineation of specialties in terms of journals using the dynamic journal set of the SCI , 2005, Scientometrics.

[29]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[30]  Ricard V. Solé,et al.  Two Regimes in the Frequency of Words and the Origins of Complex Lexicons: Zipf’s Law Revisited* , 2001, J. Quant. Linguistics.

[31]  Loet Leydesdorff,et al.  The Challenge of Scientometrics: The Development, Measurement, and Self-Organization of Scientific Communications , 2001 .

[32]  J. Aitchison,et al.  The lognormal distribution : with special reference to its uses in economics , 1957 .

[33]  M. Carl Drott,et al.  An empirical examination of Bradford's law and the scattering of scientific literature , 1978, J. Am. Soc. Inf. Sci..

[34]  Manfred Bonitz,et al.  The challenge of scientometrics: The development, measurement and self-organization of scientific communications , 1996, Scientometrics.

[35]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[36]  M. Bartlett,et al.  The use of transformations. , 1947, Biometrics.

[37]  Mike Thelwall,et al.  The clustering power of low frequency words in academic Webs , 2005, J. Assoc. Inf. Sci. Technol..

[38]  Bartlett Ms The use of transformations. , 1947 .

[39]  Howard D. White,et al.  Author cocitation: A literature measure of intellectual structure , 1981, J. Am. Soc. Inf. Sci..

[40]  Bertram C. Brookes,et al.  The Bradford law: A new calculus for the social sciences? , 1979, J. Am. Soc. Inf. Sci..

[41]  David M. Pennock,et al.  Winners don't take all: Characterizing the competition for links on the web , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Bertram C. Brookes,et al.  Frequency-rank distributions , 1978, J. Am. Soc. Inf. Sci..

[43]  Howard Hunt Pattee,et al.  Hierarchy Theory: The Challenge of Complex Systems , 1973 .

[44]  S. Bradford "Sources of information on specific subjects" by S.C. Bradford , 1985 .

[45]  Stephen J. Bensman Probability distributions in library and information science: A historical and practitioner viewpoint , 2000, J. Am. Soc. Inf. Sci..

[46]  Loet Leydesdroff Words and co-words as indicators of intellectual organization , 1989 .

[47]  Stephen J. Bensman The Structure of the Library Market for Scientific Journals: The Case of Chemistry , 1996 .

[48]  Derek de Solla Price,et al.  A general theory of bibliometric and other cumulative advantage processes , 1976, J. Am. Soc. Inf. Sci..

[49]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[50]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[51]  L. Vaughan,et al.  Mapping business competitive positions using web co-link analysis , 2005 .

[52]  Stephen A. McGuire,et al.  Introductory Statistics , 2007, Technometrics.

[53]  Ronald Rousseau,et al.  Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient , 2003, J. Assoc. Inf. Sci. Technol..

[54]  Howard D. White,et al.  Authors as markers of Intellectual Space: Co‐citation in studies of Science, Technology and Society , 1982, J. Documentation.

[55]  RousseauRonald,et al.  Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient , 2003 .

[56]  H. Simon,et al.  The Organization of Complex Systems , 1977 .

[57]  R. Haase,et al.  Multivariate analysis of variance. , 1987 .

[58]  H. Jeffreys A Treatise on Probability , 1922, Nature.

[59]  Loet Leydesdorff Can scientific journals be classified in terms of aggregated journal-journal citation relations using the Journal Citation Reports? , 2006 .

[60]  F. J. Anscombe,et al.  THE TRANSFORMATION OF POISSON, BINOMIAL AND NEGATIVE-BINOMIAL DATA , 1948 .

[61]  Howard D. White Replies and a correction , 2004, J. Assoc. Inf. Sci. Technol..

[62]  Stephen J. Bensman Bradford's Law and Fuzzy Sets: Statistical Implications for Library Analyses , 2001 .

[63]  John Phillip Immroth,et al.  A guide to Library of Congress classification , 1968 .

[64]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[65]  Katherine W. McCain,et al.  Mapping authors in intellectual space: A technical overview , 1990, J. Am. Soc. Inf. Sci..

[66]  Bertram C. Brookes Ranking techniques and the empirical log law , 1984, Inf. Process. Manag..

[67]  Stephen J. Bensman,et al.  Scientific and Technical Serials Holdings Optimization in an Inefficient Market: A LSU Serials Redesign Project Exercise , 1998 .