When size matters.

It is longstanding epidemiology textbook wisdom that follow-up (or cohort) studies come at an expense. For one, most conditions of interest, including chronic diseases, occur in the population at low rates and only small proportions of a study cohort are affected per unit time of follow-up. On the other hand, the frequency and distribution of the exposures of interest, in particular when they are rare, pose further challenges to the informative value of a prospective study. Within a given study setting, there are essentially two options by which an optimized accumulation of the required person-time experience may be achieved: either by extending the time span of following the cohort or by increasing the number of individuals sampled into the study base. Commonly, a combination of both options is applied to adapt the strategies of study conduct to the limitations of feasibility, practicability and financial constraints. In fact, the definition and enumeration of a cohort is often laborious and time consuming, and it needs proper planning and preparation; but it is often only the first, and not necessarily the most demanding, in a long sequence of steps which follow. A variety of factors affect the course—and thus the tangible and intangible expenses—of a cohort. Adding to the problems of rare exposures and incidences mentioned above, the lag-time or induction periods for the occurrence of different endpoint conditions may belong and have to be accommodated in the study plan. Also, the intensity and frequency of follow-up examinations, participant contacts and record validations, which are required to ascertain these endpoints have to be taken into consideration. Furthermore, the migratory mobility of the cohort may adversely affect the ability to effectively trace all individuals of the study sample, resulting in high proportions of loss to follow-up. Therefore, the planning and execution of cohort studies poses frequently a major challenge to epidemiological research. The numbers of prospective cohort studies that are presently conducted is nevertheless considerable. Numerous modifications of the prospective design have also been proposed and applied, mostly with the aim of making a more efficient use of study data, such as, for example, nested case–control or case–cohort studies. Also, cohorts are often restricted to individuals with special exposures, for example, to certain settings of potential occupational or environmental risk, or to patients with certain clinical conditions, like cohorts of HIV patients. Readers who are interested to get to know more about the details of specific studies are referred to the Cohort Profiles that have formed an established feature over the recent years in this Journal. Despite, or rather because of, the prevalent diversity and heterogeneity of individual cohort studies, researchers have started to join forces to pursue other strategies that offer promising perspectives when attempting to deal with the contradiction between the demand for estimate precision and statistical power (i.e. numbers of events and person-time) and the inevitable obstacles consequent on limited resources. The solution is pooling of individual cohorts, either by meta-analysis of individual study estimates or by pooled analyses of individual study data. The idea is straightforward and tries to make optimal use of what prospective data are available in the epidemiological research arena. This development has gained momentum and attracted supporters in many fields. There are, however, also trade-offs and limitations that need to be considered. In this regard, the methodological differences between individual cohorts are of particular relevance because they may not be easily discounted. For example, variations in laboratory methods or in the definitions of endpoints may be too marked to permit pooled analyses. Thus, ignoring such incongruence between studies, the simplicity of the approach may occasionally be seductive and result in misleading analyses. The need for quality data of large quantity with good harmonization has been recognized, however, and recently a group of investigators joined in proposing the DataSHaPER approach to integrating data across studies. This approach, with the aim of creating vast sample sizes for bio-clinical studies, consist of two components that support the preparation of congruent protocols for data collection and provide a central reference to facilitate harmonization. The approach may be used prospectively, as a source and guide for creating harmonized questions for new studies, or retrospectively, as a structured framework for harmonizing existing studies. Of note, the benefit of having specific Published by Oxford University Press on behalf of the International Epidemiological Association

[1]  L. O'Shea,et al.  What Have We Learned and Where Are We Headed , 1997 .

[2]  Sarah Lewis,et al.  Genetic epidemiology and public health: hope, hype, and future prospects , 2005, The Lancet.

[3]  Paolo Vineis,et al.  Design Options for Molecular Epidemiology Research within Cohort Studies , 2005, Cancer Epidemiology Biomarkers & Prevention.

[4]  Francis S. Collins,et al.  Genes, environment and the value of prospective cohort studies , 2006, Nature Reviews Genetics.

[5]  Graham A. Colditz,et al.  Merging and emerging cohorts: Not worth the wait , 2007, Nature.

[6]  H. Campbell,et al.  Commentary: rare alleles, modest genetic effects and the need for collaboration. , 2007, International journal of epidemiology.

[7]  N. Pearce,et al.  Selecting appropriate study designs to address specific research questions in occupational epidemiology , 2007, Occupational and Environmental Medicine.

[8]  P. Donnelly,et al.  New models of collaboration in genome-wide association studies: the Genetic Association Information Network , 2007, Nature Genetics.

[9]  Paul Elliott,et al.  The UK Biobank sample handling and storage validation studies. , 2008, International journal of epidemiology.

[10]  W. Manning,et al.  Hypertrophic cardiomyopathy phenotype revisited after 50 years with cardiovascular magnetic resonance. , 2009, Journal of the American College of Cardiology.

[11]  Martin Dichgans,et al.  Advances in Genomic Analysis of Stroke: What Have We Learned and Where Are We Headed? , 2010, Stroke.

[12]  Yikyung Park,et al.  Body-mass index and mortality among 1.46 million white adults. , 2010, The New England journal of medicine.

[13]  Stephen Kaptoge,et al.  Statistical methods for the time-to-event analysis of individual participant data from multiple epidemiological studies , 2010, International journal of epidemiology.

[14]  J. Danesh,et al.  Lipoprotein-associated phospholipase A2 and risk of coronary disease, stroke, and mortality: collaborative analysis of 32 prospective studies , 2010, The Lancet.

[15]  R. Collins,et al.  Enhancing the feasibility of large cohort studies. , 2010, JAMA.

[16]  Hans Hillege,et al.  Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies , 2010, International journal of epidemiology.

[17]  J. Danesh,et al.  Triglyceride-mediated pathways and coronary disease: collaborative analysis of 101 studies , 2010, The Lancet.

[18]  R. Shephard Body-Mass Index and Mortality among 1.46 Million White Adults , 2011 .