Combining full text and bibliometric information in mapping scientific disciplines

In the present study results of an earlier pilot study by Glenisson, Glanzel and Persson are extended on the basis of larger sets of papers. Full text analysis and traditional bibliometric methods are serially combined to improve the efficiency of the two individual methods. The text mining methodology already introduced in the pilot study is applied to the complete publication year 2003 of the journal Scientometrics. Altogether 85 documents that can be considered research articles or notes have been selected for this exercise. The outcomes confirm the main results of the pilot study, namely, that such hybrid methodology can be applied to both research evaluation and information retrieval. Nevertheless, Scientometrics documents published in 2003 cover a much broader and more heterogeneous spectrum of bibliometrics and related research than those analysed in the pilot study. A modified subject classification based on the scheme used in an earlier study by Schoepflin and Glanzel has been applied for validation purposes.

[1]  William E. Snizek,et al.  Textual and nontextual characteristics of scientific papers: Neglected science indicators , 2005, Scientometrics.

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  Michel Zitt,et al.  Development of a method for detection and trend analysis of research fronts built by lexical or cocitation analysis , 1994, Scientometrics.

[4]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[5]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[6]  Henk F. Moed,et al.  Mapping of science by combined co-citation and word analysis. II: Dynamical aspects , 1991 .

[7]  Jean Pierre Courtial,et al.  Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry , 1991, Scientometrics.

[8]  A. F. J. van Raan,et al.  Handbook of quantitative studies of science and technology , 1988 .

[9]  Anton J. Enright,et al.  BioLayout-an automatic graph layout algorithm for similarity visualization , 2001, Bioinform..

[10]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[11]  Anthony F. J. van Raan,et al.  Bibliometric cartography of scientific and technological developments of an R & D field , 1994, Scientometrics.

[12]  Nicholas C. Mullins,et al.  THE STRUCTURAL ANALYSIS OF A SCIENTIFIC PAPER , 1988 .

[13]  Henk F. Moed,et al.  Mapping of Science by Combined Co-Citation and Word Analysis. I. Structural Aspects , 1991 .

[14]  Wolfgang Glänzel,et al.  Combining full-text analysis and bibliometric indicators. A pilot study , 2005, Scientometrics.

[15]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[16]  Bart De Moor,et al.  Meta-clustering of gene expression data and literature-based information , 2003, SKDD.

[17]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[18]  Wolfgang Glänzel,et al.  Combining full-text analysis and bibliometric indicators , 2004 .

[19]  Wolfgang Glänzel,et al.  Two decades of "Scientometrics". An interdisciplinary field represented by its leading journal , 2004, Scientometrics.

[20]  Henk F. Moed,et al.  Mapping of science by combined co-citation and word analysis: II: Dynamical aspects , 1991, J. Am. Soc. Inf. Sci..

[21]  Tibor Braun,et al.  No-bells for ambiguous lists of ranked Nobelists as science indicators of national merit in physics, chemistry and medicine, 1901-2001 , 2004, Scientometrics.

[22]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[23]  Jean-Charles Lamirel,et al.  New classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping , 2004, Scientometrics.

[24]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .