Delineation of the genomics field by hybrid citation-lexical methods: interaction with experts and validation process

In advanced methods of delineation and mapping of scientific fields, hybrid methods open a promising path to the capitalisation of advantages of approaches based on words and citations. One way to validate the hybrid approaches is to work in cooperation with experts of the fields under scrutiny. We report here an experiment in the field of genomics, where a corpus of documents has been built by a hybrid citation-lexical method, and then clustered into research themes. Experts of the field were associated in the various stages of the process: lexical queries for building the initial set of documents, the seed; citation-based extension aiming at reducing silence; final clustering to identify noise and allow discussion on border areas. The analysis of experts’ advices show a high level of validation of the process, which combines a high-precision and low-recall seed, obtained by journal and lexical queries, and a citation-based extension enhancing the recall. This findings on the genomics field suggest that hybrid methods can efficiently retrieve a corpus of relevant literature, even in complex and emerging fields.

[1]  Michel Zitt,et al.  Mapping nanosciences by citation flows: A preliminary analysis , 2007, Scientometrics.

[2]  Alain Lelu Clusters and factors: neural algorithms for a novel representation of huge and highly multidimensional data sets , 1994 .

[3]  T. V. Leeuwen,et al.  The use of combined bibliometric methods in research funding policy , 2001 .

[4]  Victor A. McKusick,et al.  A new discipline, a new name, a new journal , 1987 .

[5]  Terje Bruen Olsen,et al.  Validation of Bibliometric Indicators in the Field of Microbiology: A Norwegian Case Study , 2004, Scientometrics.

[6]  Henk F. Moed,et al.  Measuring national output in physics: Delimitation problems , 1993, Scientometrics.

[7]  Mark S. Boguski,et al.  The only thing permanent is change , 2003 .

[8]  Michel Zitt,et al.  Hybrid maps of scientific fields: an application to nanosciences , 2008 .

[9]  llambert Characterization of Genomics in Canada: A Bibliometric Study of Scientific Articles and Research Grants 1995-1997 , 1999 .

[10]  Henk F. Moed,et al.  Delimitation of scientific subfields using cognitive words from corporate addresses in scientific publications , 2005, Scientometrics.

[11]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[12]  Michel Zitt,et al.  Delineating complex scientific fields by an hybrid lexical-citation method: An application to nanosciences , 2006, Inf. Process. Manag..

[13]  Grant Lewison,et al.  Visualization of a Scientific Community of Indian origin in the US : a case study of Bioinformatics and Genomics , 2006 .

[14]  M. Schader,et al.  New Approaches in Classification and Data Analysis , 1994 .

[15]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[16]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .