The unification of institutional addresses applying parametrized finite-state graphs (P-FSG)

SummaryWe propose a semi-automatic method based on finite-state techniques for the unification of corporate source data, with potential applications for bibliometric purposes. Bibliographic and citation databases have a well-known problem of inconsistency in the data at micro-level and meso-level, affecting the quality of bibliometric searches and the evaluation of research performance. The unification method applies parametrized finite-state graphs (P-FSG) and involves three stages: (1) breaking of corporate source data in independent units of analysis; (2) creation of binary matrices; and (3) drawing finite-state graphs. This procedure was tested on university departmental addresses, downloaded from the ISI Web of Science. Evaluation was in terms of an adaptation of the measures of precision and recall. The results demonstrate the usefulness of this approach, though it requires some human processing.

[1]  Max Silberztein,et al.  INTEX: An FST Toolbox , 2000, Theor. Comput. Sci..

[2]  Henk F. Moed,et al.  The unification of addresses in scientific publications , 1990 .

[3]  Emmanuel Roche Analyse syntaxique transformationnelle du francais par transducteurs et lexique-grammaire , 1993 .

[4]  Henk F. Moed,et al.  Measuring national output in physics: Delimitation problems , 1993, Scientometrics.

[5]  W. Whitham,et al.  Problems with “measurement” , 1991 .

[6]  Martha E. Williams,et al.  Lack of standardization of the journal title data element in databases , 1981, J. Am. Soc. Inf. Sci..

[7]  Blaise Cronin,et al.  Comparative citation rankings of authors in monographic and journal literature: a study of sociology , 1997, J. Documentation.

[8]  M. Gross The Construction of Local Grammars , 1997 .

[9]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[10]  Emmanuel Roche,et al.  Finite state transducers: parsing free and frozen sentences , 1999 .

[11]  Anthony F. J. van Raan,et al.  Advanced bibliometric methods for the evaluation of universities , 1999, Scientometrics.

[12]  James C. French,et al.  Using clustering strategies for creating authority files , 2000, J. Am. Soc. Inf. Sci..

[13]  Anne B. Piternick Standardization of journal titles in databases , 1982, J. Am. Soc. Inf. Sci..

[14]  Félix de Moya Anegón,et al.  Approximate personal name-matching through finite-state graphs , 2007, J. Assoc. Inf. Sci. Technol..

[15]  Ronald E. Rice,et al.  Journal-to-journal citation data: Issues of validity and reliability , 1989, Scientometrics.

[16]  Henk F. Moed,et al.  Bibliometric Indicators Reflect Publication and Management Strategies , 2000, Scientometrics.

[17]  Mehryar Mohri,et al.  On some applications of finite-state automata theory to natural language processing , 1996, Nat. Lang. Eng..

[18]  Eugene Garfield,et al.  Citation indexing - its theory and application in science, technology, and humanities , 1979 .

[19]  A. Raan The use of bibliometric analysis in research performance assessment and monitoring of interdisciplinary scientific developments , 2003 .

[20]  Max Silberztein,et al.  Dictionnaires électroniques et analyse automatique de textes : le système intex , 1993 .

[21]  Yves Schabes,et al.  Deterministic Part-of-Speech Tagging with Finite-State Transducers , 1995, Comput. Linguistics.

[22]  Anthony F. J. van Raan,et al.  Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods , 2005, Scientometrics.

[23]  Tomek Strzalkowski Natural Language Information Retrieval , 1995, Inf. Process. Manag..

[24]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[25]  Arthur W. Elias,et al.  Control and Elimination of Errors in ISI Services. , 1966 .

[26]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[27]  Lawrence B. Mohr,et al.  Understanding Significance Testing , 1990 .

[28]  Sébastien Paumier,et al.  De la reconnaissance des formes linguistiques à l'analyse syntaxique , 2003 .

[29]  Paula Mählck,et al.  Socio-Bibliometric Mapping of Intra-Departmental Networks , 2000, Scientometrics.

[30]  Concepción S. Wilson,et al.  Informetric studies using databases: Opportunities and challenges , 2003, Scientometrics.

[31]  Paul Bourke,et al.  Standards issues in a national bibliometric database: The Australian case , 1996, Scientometrics.

[32]  Henk F. Moed,et al.  INDICATORS OF RESEARCH PERFORMANCE: APPLICATIONS IN UNIVERSITY RESEARCH POLICY , 1988 .

[33]  N. Mullins,et al.  NETWORK ANALYSIS IN THE STUDY OF SCIENCE AND TECHNOLOGY , 1988 .

[34]  Evelyne Tzoukermann,et al.  NLP for Term Variant Extraction: Synergy Between Morphology, Lexicon, and Syntax , 1999 .

[35]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[36]  Henk F. Moed,et al.  Combining Mapping and Citation Analysis for Evaluative Bibliometric Purposes: A Bibliometric Study , 1999, J. Am. Soc. Inf. Sci..

[37]  L. Butler,et al.  Institutions and the map of science: matching university departments and fields of research , 1998 .

[38]  Wolfgang Glänzel,et al.  “Hyphenation” of databases in building scientometric indicators , 2005, Scientometrics.

[39]  van Raan,et al.  Advanced bibliometric methods to assess research performance and scientific development: basic principles and recent practical applications , 1993 .

[40]  Jean-Pierre Chanod,et al.  Incremental Finite-State Parsing , 1997, ANLP.

[41]  Loet Leydesdorff,et al.  Problems with the ‘measurement’ of national scientific performance , 1988 .

[42]  Olle Persson,et al.  Studying research collaboration using co-authorships , 1996, Scientometrics.

[43]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[44]  Victor Herrero,et al.  A new technique for building maps of large scientific domains based on the cocitation of classes and categories , 2004 .

[45]  Brenda S. Baker,et al.  A theory of parameterized pattern matching: algorithms and applications , 1993, STOC.

[46]  James C. French,et al.  Using clustering strategies for creating authority files , 2000 .

[47]  Brenda S. Baker Parameterized Pattern Matching: Algorithms and Applications , 1996, J. Comput. Syst. Sci..

[48]  Zaida Chinchilla-Rodríguez,et al.  A new technique for building maps of large scientific domains based on the cocitation of classes and categories , 2004, Scientometrics.

[49]  Francis Narin,et al.  Bibliometric profiles for British academic institutions: An experiment to develop research output indicators , 1988, Scientometrics.

[50]  Henk F. Moed,et al.  Delimitation of scientific subfields using cognitive words from corporate addresses in scientific publications , 2005, Scientometrics.

[51]  Henk F. Moed,et al.  Bibliometric indicators of university research performance in Flanders , 1998 .

[52]  Benno Müller-Hill,et al.  Quality and efficiency of basic research in molecular biology: a bibliometric analysis of thirteen excellent research institutes , 1995 .

[53]  Barbara Stefaniak Use of bibliographic data bases for scientometric studies , 2005, Scientometrics.

[54]  Henk F. Moed,et al.  Possible inaccuracies occurring in citation analysis , 1989, J. Inf. Sci..