Variation, Variability, Batches and Bias in Microarray Experiments: An Introduction

Microarray-based measurement of gene expression levels is a widely used technology in biological and medical research. The discussion around the impact of variability on the reproducibility of microarray data has captured the imagination of researchers ever since the invention of microarray technology in the mid 1990s. Variability has many sources of the most diverse kinds, and depending on the experimental performance it can manifest itself as a random factor or as a systematic factor, termed bias. Knowledge of the biological/medical as well as the practical background of a planned microarray experiment helps alleviate the impact of systematic sources of variability, but can hardly address random effects. The invention of microarray technology in the mid 1990s allowed the simultaneous monitoring of the expression levels of thousands of genes (Brown and Botstein 1999; Lockhart et al. 1996; Schena et al. 1995). Microarray-based high density/high content gene expression technology is nowadays commonly used in fundamental biological and medical research to generate testable hypotheses on physiological processes and disease. It is designed to measure variation of expression due to biological, physiological, genetic and/or environmental conditions, and it allows us to study differences in gene expression induced by factors of interest, such as pharmacological and toxicological effects of compounds, environmental effects, growth and aging, and disease phenotypes. We note that the term ‘variation’ describes directly measurable differences among individuals or samples, while the term ‘variability’ refers to the potential to vary. Batch Effects and Noise in Microarray Experiments: Sources and Solutions edited by A. Scherer © 2009 John Wiley & Sons, Ltd CO PY RI GH TE D M AT ER IA L 2 Batch Effects and Noise in Microarray Experiments As we shall see in more detail in Chapter 2, relative quantification of gene expression involves many steps including sample handling, messenger RNA (mRNA) extraction, in-vitro reverse transcription, labeling of complementary RNA (cRNA) with fluorescent dyes, hybridization of the labeled cDNA (target) to oligonucleotides with complementary sequences (probes), which are immobilized on solid surfaces, and the measurement of the intensity of the fluorescent signal which is emitted by the labeled target. The measured signal intensity per target is a measure of relative abundance of the particular mRNA species in the original biological sample. Unfortunately, microarray technology has its caveats, as it is susceptible to variability like any other measurement process. As we will discuss in Chapters 2 and 3, technical variation manifests itself in signal intensity variability. This effect is informally called ‘noise’: technical components which are not part of the system under investigation but which, if they enter the system, lead to variability in the experimental outcomes. Note that noise is only defined in the context of technology. Since the early years of microarrays, noise and its impact on the reliability of large-scale genomics data analysis have been a much discussed topic. The team of Kerr et al. (2000b) was among the first to recognize the problem and to propose ANOVA methods to estimate noise in microarray data sets. Tu et al. (2002) addressed the issue of how to measure the impact of different sources of noise. Using a set of replicate arrays with varying degrees of preparation differences, they were able to characterize quantitatively that the hybridization noise is very high compared to sample preparation or amplification. They also found that the level of noise is signal intensity dependent, and propose a method for significance testing based on noise characteristics. The unresolved issue of measurement variability and measuring variability has hampered the great hopes researchers had with the advent of microarray technology and the human genome sequence project. Since consensus technological, analytical, and reporting processes were (and still are) largely missing, it appeared that not only were gene expression data irreproducible, but also the results were very much dependent on the choice of analytical methods. A lively discussion on the validity of microarray technology resulted in publications and comments like ‘Microarrays and molecular research: noise discovery?’ (Ioannidis 2005), ‘An array of problems’ (Frantz 2005), countered by ‘Arrays of hope’ (Strauss 2006), and ‘In praise of arrays’ (Ying and Sarwal 2008), and publications which raise questions about the reproducibility of microarray data (Marshall 2004; Ein-Dor et al. 2006) or showing increased reproducibility (Dobbin et al. 2005b; Irizarry et al. 2005; Larkin et al. 2005). Shi et al. addressed this issue in a systematic manner and in 2006 published a comparative analysis of a large data set which had been generated by MicroArray Quality Control Consortium (MAQC) with 137 participants from 51 organizations (Shi et al. 2006). The data set consists of two commercially available RNA samples of high quality – Universal Human Reference RNA (UHRR) and Human Brain Reference RNA – which were mixed in four titration pools, and whose mRNA levels were measured on seven microarray platforms in addition to three alternative platforms. Each array platform was deployed at three test sites, and from each sample five replicates were assayed at each site. This informationrich data set is an excellent source for the investigation of technological noise, and some of its data will be used in a number of chapters in this book. The project showed that quantitative measures across all one-color array platforms had a median coefficient of