Precise Gene Expression Measurement with Outlier Detection

Many approaches of Gene Expression Pro ling (GEP) measurements were suggested. However, mostmethods exhibit signi cant gross errors. There is a need to study and improve precision of measure-ment methods. This work did it for a speci c method, i.e., the multiplex PCR with color-taggedmodule-shuing primers [3, 1, 2]. It was designed for the precise measurement of small di erences ofgene expression. However, this method performs the single measurement of each gene in each sample.Such data cannot be used to assess precision. The original method used three dyes each measuringdi erent sample. We used this three dyes for the same sample to assess measurement quality. Thisrevealed many outliers (wrong measurements), i.e., measurements di ering more than twice from themeasured value. This paper analyses this three-sample GEP measuring system from a statisticalviewpoint. The aim is (1) to determine the precision of measurement and the sources of errors, (2) todevelop the method which identi es outliers, and (3) suggests modi cations to improve precision andreproducibility. The data processing was optimized for the given technology. However, the principleslike the transposition of uorescent dyes for outlier detection, data projection for its normalization,and clustering (EM algorithm, Gaussian statistical model) can be applied to other technologies too.