A comparison of methods for collecting web citation data for academic organizations

The primary webometric method for estimating the online impact of an organization is to count links to its website. Link counts have been available from commercial search engines for over a decade but this was set to end by early 2012 and so a replacement is needed. This article compares link counts to two alternative methods: URL citations and organization title mentions. New variations of these methods are also introduced. The three methods are compared against each other using Yahoo!. Two of the three methods (URL citations and organization title mentions) are also compared against each other using Bing. Evidence from a case study of 131 UK universities and 49 US Library and Information Science (LIS) departments suggests that Bing's Hit Count Estimates (HCEs) for popular title searches are not useful for webometric research but that Yahoo!'s HCEs for all three types of search and Bing's URL citation HCEs seem to be consistent. For exact URL counts the results of all three methods in Yahoo! and both methods in Bing are also consistent. Four types of accuracy factors are also introduced and defined: search engine coverage, search engine retrieval variation, search engine retrieval anomalies, and query polysemy. © 2011 Wiley Periodicals, Inc.

[1]  J. Ravetz Sociology of Science , 1972, Nature.

[2]  Liwen Vaughan,et al.  Word co-occurrences on Webpages as a measure of the relatedness of organizations: A new Webometrics concept , 2010, J. Informetrics.

[3]  José Luis Ortega,et al.  Scientific research activity and communication measured with cybermetrics indicators , 2006, J. Assoc. Inf. Sci. Technol..

[4]  Judit Bar-Ilan,et al.  Data collection methods on the Web for infometric purposes — A review and analysis , 2004, Scientometrics.

[5]  E. Garfield The history and meaning of the journal impact factor. , 2006, JAMA.

[6]  Liwen Vaughan,et al.  Links to commercial websites as a source of business information , 2004, Scientometrics.

[7]  Liwen Vaughan,et al.  Exploring website features for business information , 2004, Scientometrics.

[8]  Debora Shaw,et al.  Web citation data for impact assessment: A comparison of four science disciplines , 2005, J. Assoc. Inf. Sci. Technol..

[9]  Lutz Bornmann,et al.  What do citation counts measure? A review of studies on citing behavior , 2008, J. Documentation.

[10]  M. Eysenck,et al.  The correlation between RAE ratings and citation counts in psychology Technical Report , 2002 .

[11]  Mike Thelwall Extracting accurate and complete results from search engines: Case study windows live , 2008 .

[12]  Junping Qiu,et al.  An analysis of backlink counts and Web Impact Factorsfor Chinese university websites , 2004, Scientometrics.

[13]  Mike Thelwall,et al.  Using the Web for research evaluation: The Integrated Online Impact indicator , 2010, J. Informetrics.

[14]  Alastair Smith,et al.  A Tale of Two Web Spaces: Comparing Sites Using Web Impact Factors. , 1999 .

[15]  Mike Thelwall,et al.  Assessing the impact of disciplinary research on teaching: An automatic analysis of online syllabuses , 2008 .

[16]  Mike Thelwall,et al.  National and international university departmental Web site interlinking , 2005, Scientometrics.

[17]  Junping Qiu,et al.  Research on the Relationships between Chinese Journal Impact Factors and External Web Link Counts and Web Impact Factors , 2004 .

[18]  Mike Thelwall,et al.  Methods for reporting on the targets of links from national systems of university Web sites , 2004, Inf. Process. Manag..

[19]  Debora Shaw,et al.  Banking (on) different forms of symbolic capital , 2002, J. Assoc. Inf. Sci. Technol..

[20]  Mike Thelwall,et al.  Web intelligence analyses of digital libraries: A case study of the National electronic Library for Health (NeLH) , 2007, J. Documentation.

[21]  M. Thelwall,et al.  Investigating triple helix relationships using URL citations: a case study of the UK West Midlands automobile industry , 2006 .

[22]  Peter Ingwersen,et al.  Toward a basic framework for webometrics , 2004, J. Assoc. Inf. Sci. Technol..

[23]  Ronald Rousseau,et al.  Daily time series of common single word searches in AltaVista and NorthernLight , 1998 .

[24]  Mike Thelwall Quantitative comparisons of search engine results , 2008 .

[25]  Mike Thelwall,et al.  Google book search: Citation analysis for social science and the humanities , 2009 .

[26]  Ahmet Uyar,et al.  Investigation of the accuracy of search engine hit counts , 2009, J. Inf. Sci..

[27]  Blaise Cronin,et al.  Bibliometrics and beyond: some thoughts on web-based citation analysis , 2001, J. Inf. Sci..

[28]  Mike Thelwall,et al.  Conceptualizing documentation on the Web: An evaluation of different heuristic-based models for counting links between university Web sites , 2002, J. Assoc. Inf. Sci. Technol..

[29]  Norman Kaplan,et al.  The Sociology of Science: Theoretical and Empirical Investigations , 1974 .

[30]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[31]  Liwen Vaughan,et al.  Relationship between links to journal Web sites and impact factors , 2002, Aslib Proc..

[32]  Mike Thelwall,et al.  Motivations for URL citations to open access library and information science articles , 2006, Scientometrics.

[33]  Mike Thelwall,et al.  A statistical analysis of the web presences of European life sciences research teams , 2008 .

[34]  Judit Bar-Ilan,et al.  Search engine results over time-a case study on search engine stability , 1998 .

[35]  Henk F. Moed,et al.  Citation Analysis in Research Evaluation , 1899 .

[36]  José Luis Ortega,et al.  Scientific research activity and communication measured with cybermetrics indicators: Research Articles , 2006 .

[37]  P. Seglen,et al.  Citation rates and journal impact factors are not suitable for evaluation of research. , 1998, Acta orthopaedica Scandinavica.

[38]  Peter Ingwersen,et al.  The calculation of web impact factors , 1998, J. Documentation.

[39]  Charles Oppenheim,et al.  The correlation between citation counts and the 1992 research assessment exercise ratings for British research in genetics, anatomy and archaeology , 1997, J. Documentation.

[40]  Mike Thelwall,et al.  Online presentations as a source of scientific impact? An analysis of PowerPoint files citing academic journals , 2008 .

[41]  Lutz Bornmann,et al.  Selecting scientific excellence through committee peer review - A citation analysis of publications previously published to approval or rejection of post-doctoral research fellowship applicants , 2006, Scientometrics.

[42]  Yves Gingras,et al.  Why it has become more difficult to predict Nobel Prize winners: a bibliometric analysis of nominees and winners of the chemistry and physics prizes (1901–2007) , 2009, Scientometrics.

[43]  Michael H. MacRoberts,et al.  Problems of citation analysis , 1996, Scientometrics.

[44]  Mike Thelwall,et al.  Which factors explain the Web impact of scientists' personal homepages? , 2007 .

[45]  Mike Thelwall The top 100 linked-to pages on UK university web sites: high inlink counts are not usually associated with quality scholarly content , 2002, J. Inf. Sci..

[46]  Judit Bar-Ilan,et al.  The lifespan of “informetrics” on the Web: An eight year study (1998–2006) , 2009, Scientometrics.

[47]  R. Merton,et al.  The Sociology of Science: Theoretical and Empirical Investigations , 1975, Journal for the Scientific Study of Religion.

[48]  Isidro F. Aguillo Measuring the institution's footprint in the web , 2009, Libr. Hi Tech.

[49]  H. Moed Citation Analysis in Research Evaluation (Information Science & Knowledge Management) , 2005 .

[50]  Mike Thelwall,et al.  A university-centred European Union link analysis , 2008, Scientometrics.

[51]  M. Thelwall,et al.  Google Scholar citations and Google Web-URL citations: A multi-discipline exploratory analysis , 2007 .

[52]  Mike Thelwall,et al.  How is science cited on the Web? A classification of google unique Web citations , 2007 .

[53]  Liwen Vaughan,et al.  Web citation data for impact assessment: A comparison of four science disciplines: Book Reviews , 2005 .

[54]  Debora Shaw,et al.  Bibliographic and Web citations: What is the difference? , 2003, J. Assoc. Inf. Sci. Technol..

[55]  Mike Thelwall,et al.  National and international university departmental Web site interlinking , 2005, Scientometrics.

[56]  Dirk Lewandowski,et al.  The freshness of web search engine databases , 2006, J. Inf. Sci..

[57]  Paul Nieuwenhuysen,et al.  Internet search engines - fluctuations in document accessibility , 2001, J. Documentation.

[58]  Charles Oppenheim,et al.  The Correlation between citation counts and the 1992 Research Assessment Exercise Ratings for British Library and Information Science University departments , 1995, J. Documentation.

[59]  Chaomei Chen,et al.  How did university departments interweave the Web: A study of connectivity and underlying factors , 1998, Interact. Comput..

[60]  Judit Bar-Ilan,et al.  A method for measuring the evolution of a topic on the Web: The case of “informetrics” , 2009 .

[61]  Judit Bar-Ilan,et al.  What do we know about links and linking? A framework for studying links in academic environments , 2005, Inf. Process. Manag..

[62]  Mike Thelwall,et al.  Do the Web sites of higher rated scholars have significantly more online impact? , 2004, J. Assoc. Inf. Sci. Technol..

[63]  Mike Thelwall,et al.  Motivations for academic web site interlinking: evidence for the Web as a novel source of information on informal scholarly communication , 2003, J. Inf. Sci..

[64]  Mike Thelwall A comparison of link and URL citation counting , 2011, Aslib Proc..

[65]  Blaise Cronin,et al.  Invoked on the Web , 1998, J. Am. Soc. Inf. Sci..

[66]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[67]  Diane H. Sonnenwald,et al.  Citation ranking versus peer evaluation of senior faculty research performance: a case study of Kurdish scholarship , 2000 .

[68]  Vivian Cothey,et al.  Web-crawling reliability , 2004, J. Assoc. Inf. Sci. Technol..

[69]  Michael L. Nelson,et al.  Agreeing to disagree: search engines and their public interfaces , 2007, JCDL '07.

[70]  Ahmet Uyar Google stemming mechanisms , 2009 .