An automatic method for extracting citations from Google Books

Recent studies have shown that counting citations from books can help scholarly impact assessment and that Google Books (GB) is a useful source of such citation counts, despite its lack of a public citation index. Searching GB for citations produces approximate matches, however, and so its raw results need time‐consuming human filtering. In response, this article introduces a method to automatically remove false and irrelevant matches from GB citation searches in addition to introducing refinements to a previous GB manual citation extraction method. The method was evaluated by manual checking of sampled GB results and comparing citations to about 14,500 monographs in the Thomson Reuters Book Citation Index (BKCI) against automatically extracted citations from GB across 24 subject areas. GB citations were 103% to 137% as numerous as BKCI citations in the humanities, except for tourism (72%) and linguistics (91%), 46% to 85% in social sciences, but only 8% to 53% in the sciences. In all cases, however, GB had substantially more citing books than did BKCI, with BKCI's results coming predominantly from journal articles. Moderate correlations between the GB and BKCI citation counts in social sciences and humanities, with most BKCI results coming from journal articles rather than books, suggests that they could measure the different aspects of impact, however.

[1]  Mike Thelwall,et al.  Can the impact of non‐Western academic books be measured? An investigation of Google Books and Google Scholar for Malaysia , 2014, J. Assoc. Inf. Sci. Technol..

[2]  Wolfgang Glänzel,et al.  Opportunities for and limitations of the Book Citation Index , 2013, J. Assoc. Inf. Sci. Technol..

[3]  Evaristo Jiménez-Contreras,et al.  Most borrowed is most cited? Library loan statistics as a proxy for monograph selection in citation indexes , 2013, ArXiv.

[4]  Andrew Philip Weiss,et al.  Assessing the coverage of Hawaiian and Pacific books in the Google Books Digitization Project , 2013, OCLC Syst. Serv..

[5]  Daniel Torres-Salinas,et al.  Mapping citation patterns of book chapters in the Book Citation Index , 2012, J. Informetrics.

[6]  J. S. Fulda Google Books and Other Internet Mischief , 2012 .

[7]  Xiaotian Chen,et al.  Google Books and WorldCat: a comparison of their content , 2012 .

[8]  Nicolás Robinson-García,et al.  Towards a Book Publishers Citation Reports. First approach using the Book Citation Index , 2012, Revista española de Documentación Científica.

[9]  Dália Leonardo,et al.  Google Books: primary sources in the public domain , 2012 .

[10]  Loet Leydesdorff,et al.  Edited Volumes, Monographs and Book Chapters in the Book Citation Index (BKCI) and Science Citation Index (SCI, Sosci, A&HCI) , 2012, J. Sci. Res..

[11]  Ryan James,et al.  An Assessment of Google Books’ Metadata , 2012 .

[12]  Mike Thelwall,et al.  Assessing the citation impact of books: The role of Google Books, Google Scholar, and Scopus , 2011, J. Assoc. Inf. Sci. Technol..

[13]  Björn Hammarfelt,et al.  Interdisciplinarity and the intellectual base of literature studies: citation analysis of highly cited monographs , 2011, Scientometrics.

[14]  Ryan James,et al.  An Assessment of the Legibility of Google Books , 2010 .

[15]  Hannibal Travis Estimating the Economic Impact of Mass Digitization Projects on Copyright Holders: Evidence from the Google Book Search Litigation , 2010 .

[16]  Judit Bar-Ilan,et al.  Citations to the “Introduction to informetrics” indexed by WOS, Scopus and Google Scholar , 2010, Scientometrics.

[17]  Mike Thelwall,et al.  Google book search: Citation analysis for social science and the humanities , 2009, J. Assoc. Inf. Sci. Technol..

[18]  Fletcher T. H. Cole,et al.  Libcitations: A measure for comparative assessment of book publications in the humanities and social sciences , 2009, J. Assoc. Inf. Sci. Technol..

[19]  E. Diane Johnson Google Book Search coverage of core clinical textbooks , 2009 .

[20]  J. Forman Taxing the Working Poor : The Political Origins and Economic Consequences of Taxing Low Wages , 2009 .

[21]  Millie Jackson,et al.  Using Metadata to Discover the Buried Treasure in Google Book Search , 2008 .

[22]  Yu-Wei Chang,et al.  Characteristics of research output in social sciences and humanities: From a research evaluation perspective , 2008, J. Assoc. Inf. Sci. Technol..

[23]  L. Vincent Google Book Search: Document Understanding on a Massive Scale , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[24]  Günter Krampen,et al.  On the validity of citation counting in science evaluation: Content analyses of references and citations in psychological publications , 2007, Scientometrics.

[25]  Vincent Larivière,et al.  Benchmarking scientific output in the social sciences and humanities: The limits of existing databases , 2006, Scientometrics.

[26]  Vincent Larivière,et al.  The place of serials in referencing practices: Comparing natural sciences and engineering with social sciences and humanities , 2006, J. Assoc. Inf. Sci. Technol..

[27]  Linda Butler,et al.  Extending citation analysis to non-source items , 2006, Scientometrics.

[28]  Anton J. Nederhof,et al.  Bibliometric monitoring of research performance in the Social Sciences and the Humanities: A Review , 2006, Scientometrics.

[29]  Diana Hicks,et al.  The difficulty of achieving full coverage of international social science literature and the bibliometric consequences , 1999, Scientometrics.

[30]  Wolfgang Glänzel,et al.  A Bibliometric Study of Reference Literature in the Sciences and Social Sciences , 1999, Inf. Process. Manag..

[31]  John Cullars,et al.  Citation characteristics of English-language monographs in philosophy , 1998 .

[32]  Blaise Cronin,et al.  Comparative citation rankings of authors in monographic and journal literature: a study of sociology , 1997, J. Documentation.

[33]  Henk F. Moed,et al.  Citation Analysis in Research Evaluation , 1899 .