Managing database overlap in systematic reviews using Batch Citation Matcher: case studies using Scopus.

Systematic reviews use explicit methodology to review and synthesize research evidence in health care [1]. The objective of the method is to limit bias, therefore, a comprehensive literature search is required to identify as much of the relevant literature as possible [2]. The role of multiple database searching is twofold (Figure 1): (1) to broaden coverage to include additional sources (unique coverage) and (2) to take advantage of differences in indexing across databases to increase the chances of retrieving relevant items that are in both databases (incremental retrieval). The marginal contribution of each additional source searched is the retrieval from the unique coverage plus the incremental retrieval from overlapping coverage. For example, if 55% of relevant studies are identified after searching database 1, and 95% are identified after searching database 1 and database 2, the marginal contribution of database 2 is 40%. In some systematic reviews, 20 or more databases with overlapping content may be searched [3–7]. Managing this overlap is a pressing issue for systematic reviewers. Figure 1 Potential contributions from searching an additional database MEDLINE is almost universally used as a starting point in health-related systematic reviews. Numerous useful limits [8, 9] and methodological hedges are available [10]. MEDLINE indexing has greater discriminating power than the indexing of several other biomedical databases, including EMBASE [11]. Thus it yields a smaller retrieval set without sacrificing recall. Scopus is a new database produced by Elsevier Science. Its data sources include MEDLINE, EMBASE, open access sources, scientific Websites, and gray literature. Scopus lacks a thesaurus, and indexing is not standardized across the different sources that Scopus draws its content from [12]. Searching additional databases with overlapping coverage but fewer precision-enhancing features may reintroduce irrelevant material that has already been eliminated from the retrieval in the database with the fullest feature set. Relevant items may be missed in one database when assigned indexing terms different than those used by the searcher. The same record might be retrieved from another database because the indexing in that database matches the terms selected by the searcher [13]. Yet without its own indexing system, Scopus provides little chance for such incremental retrieval and the marginal yield of relevant records will be limited to its unique coverage [6]. This paper presents the development and testing of a technique to efficiently isolate records from unique sources.