SED, a normalization free method for DNA microarray data analysis

BackgroundAnalysis of DNA microarray data usually begins with a normalization step where intensities of different arrays are adjusted to the same scale so that the intensity levels from different arrays can be compared with one other. Both simple total array intensity-based as well as more complex "local intensity level" dependent normalization methods have been developed, some of which are widely used. Much less developed methods for microarray data analysis include those that bypass the normalization step and therefore yield results that are not confounded by potential normalization errors.ResultsInstead of focusing on the raw intensity levels, we developed a new method for microarray data analysis that maps each gene's expression intensity level to a high dimensional space of SEDs (Signs of Expression Difference), the signs of the expression intensity difference between a given gene and every other gene on the array. Since SED are unchanged under any monotonic transformation of intensity levels, the SED based method is normalization free. When tested on a multi-class tumor classification problem, simple Naive Bayes and Nearest Neighbor methods using the SED approach gave results comparable with normalized intensity-based algorithms. Furthermore, a high percentage of classifiers based on a single gene's SED gave good classification results, suggesting that SED does capture essential information from the intensity levels.ConclusionThe results of testing this new method on multi-class tumor classification problems suggests that the SED-based, normalization-free method of microarray data analysis is feasible and promising.

[1]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[2]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[3]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[4]  B. De Moor,et al.  Comparison and meta-analysis of microarray data: from the bench to the computer desk. , 2003, Trends in genetics : TIG.

[5]  F. Bertucci,et al.  Gene expression profiling of colon cancer by DNA microarrays and correlation with histoclinical parameters , 2004, Oncogene.

[6]  M. Meyerson,et al.  Molecular classification and molecular genetics of human lung cancers. , 2004, Seminars in oncology.

[7]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[8]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[9]  Soumyaroop Bhattacharya,et al.  A classification-based machine learning approach for the analysis of genome-wide expression data. , 2003, Genome research.

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[12]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[13]  D. Slonim,et al.  Evaluation of normalization procedures for oligonucleotide array data based on spiked cRNA controls , 2001, Genome Biology.

[14]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[15]  A. Ashworth,et al.  Molecular profiling of breast cancer: clinical implications , 2004, British Journal of Cancer.

[16]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[17]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[18]  Sharon L R Kardia,et al.  Accurate molecular classification of human cancers based on gene expression using a simple classifier with a pathological tree-based framework. , 2003, The American journal of pathology.

[19]  Wei-Min Liu,et al.  Analysis of high density expression microarrays with signed-rank call algorithms , 2002, Bioinform..

[20]  Li M Fu,et al.  Multi‐class cancer subtype classification based on gene expression signatures with reliability analysis , 2004, FEBS letters.

[21]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[22]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[24]  M. Oh,et al.  Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. , 2001, Nucleic acids research.

[25]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .