Microarray Data Analysis

Microarray analysis is an emerging field, simultaneously harnessing advances in semiconductor manufacturing, biochemistry, medicine, computation, and algorithms research. Microarrays now provide a platform for an unprecedented genome-wide view of a biological sample. Microarray analysis makes use of the vast amounts of data that the microarray platform provides. It is through the intelligent combination of mathematical algorithms and clinical validation that microarray analysis provides a real opportunity to realize the goal of targeted personalized medicine. One day, the information from a single microarray might be able to tell a doctor if a patient has cancer, what type of cancer it is, what the prognosis is, and what drug to use to best fight the cancer. The foundation of this story is being built in laboratories across the world today and it starts with sound microarray analysis. Microarray analysis is a multistep process that converts raw microarray data into biomarkers for clinical use. First, noise must be removed from raw data using preprocessing methods, such as normalization and artifact removal. Clean data can then be used to select important features or to build predictive rules called classifiers. The results of feature selection and classification are lists of biomarkers that are appropriate for classifying the data into groups such as benign or malignant. These biomarkers must then be validated clinically or through knowledge-based approaches. The results of validation can then be used as feedback in order to select better features or build better classifiers. Keywords: microarray analysis; pattern recognition; bioinformatics; cancer; personalized medicine; biomarker; DNA; RNA; computational biology

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  S. K. Moore Making chips to probe genes , 2001 .

[3]  Scott A. Rifkin,et al.  Microarray analysis of Drosophila development during metamorphosis. , 1999, Science.

[4]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[5]  Jeremy Buhler,et al.  Dapple: Improved Techniques for Finding Spots on DNA Microarrays , 2000 .

[6]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[7]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Hong Yan,et al.  Gene Expression Data Clustering and Visualization Based on a Binary Heirarchical Clustering Framework , 2003, APBC.

[9]  Tommi S. Jaakkola,et al.  Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models , 2001, Pacific Symposium on Biocomputing.

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Hong Yan,et al.  Cluster analysis of gene expression data based on self-splitting and merging competitive learning , 2004, IEEE Transactions on Information Technology in Biomedicine.

[12]  Z. Szallasi,et al.  Modeling the normal and neoplastic cell cycle with "realistic Boolean genetic networks": their application for understanding carcinogenesis and assessing therapeutic strategies. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[13]  D Thieffry,et al.  Qualitative analysis of gene networks. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[14]  Patrik D'haeseleer,et al.  Linear Modeling of mRNA Expression Levels During CNS Development and Injury , 1998, Pacific Symposium on Biocomputing.

[15]  E Mjolsness,et al.  A gene network approach to modeling early neurogenesis in Drosophila. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[16]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[17]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[18]  Hong Yan,et al.  Robust adaptive spot segmentation of DNA microarray images , 2003, Pattern Recognit..

[19]  David G. Stork,et al.  Pattern Classification , 1973 .

[20]  Charles L. Kooperberg,et al.  Improved Background Correction for Spotted DNA Microarrays , 2002, J. Comput. Biol..

[21]  X. Wang,et al.  Quantitative quality control in microarray image processing and data acquisition. , 2001, Nucleic acids research.

[22]  Satoru Miyano,et al.  Inferring qualitative relations in genetic networks and metabolic pathways , 2000, Bioinform..

[23]  René Thomas Regulatory networks seen as asynchronous automata: A logical description , 1991 .

[24]  Satoru Miyano,et al.  Bayesian Network and Nonparametric Heteroscedastic Regression for Nonlinear Modeling of Genetic Network , 2003, J. Bioinform. Comput. Biol..

[25]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[26]  Jiri Vohradsky,et al.  Genexp-a genetic network simulation environment , 2002, Bioinform..

[27]  Hong Yan,et al.  A Computational Approach to Gene Expression Data Extraction and Analysis , 2004, J. VLSI Signal Process..

[28]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[29]  Steven Skiena,et al.  Identifying gene regulatory networks from experimental data , 2001, Parallel Comput..

[30]  W. M. Carey,et al.  Digital spectral analysis: with applications , 1986 .

[31]  E Mjolsness,et al.  Delta-Notch lateral inhibitory patterning in the emergence of ciliated cells in Xenopus: experimental observations and a gene network model. , 2000, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[32]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[33]  Holger H. Hoos,et al.  Inference of Transcriptional Regulation Relationships from Gene Expression Data , 2003, Bioinform..

[34]  S. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[35]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[36]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[37]  Edward R. Dougherty,et al.  From Boolean to probabilistic Boolean networks as models of genetic regulatory networks , 2002, Proc. IEEE.

[38]  Aidong Zhang,et al.  Interactive visualization and analysis for gene expression data , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[39]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[40]  David A. Clausi,et al.  K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentation , 2002, Pattern Recognit..

[41]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[42]  Hong Yan,et al.  Dominant spectral component analysis for transcriptional regulations using microarray time-series data , 2004, Bioinform..

[43]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[44]  A. Wuensche Classifying cellular automata automatically: finding gliders, filtering, and relating space-time patterns, attractor basins, and the Z parameter , 1999 .

[45]  Satoru Miyano,et al.  Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network , 2003, Proceedings. IEEE Computer Society Bioinformatics Conference.

[46]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[47]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Sui Huang Gene expression profiling, genetic networks, and cellular states: an integrating concept for tumorigenesis and drug discovery , 1999, Journal of Molecular Medicine.

[49]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[50]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Zhi-Qiang Liu,et al.  Self-splitting competitive learning: a new on-line clustering paradigm , 2002, IEEE Trans. Neural Networks.

[52]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[53]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[54]  J. Vohradský Neural network model of gene expression , 2001, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[55]  L. Glass Combinatorial and topological methods in nonlinear chemical kinetics , 1975 .

[56]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[57]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.

[58]  J. Vohradský Neural Model of the Genetic Network* , 2001, The Journal of Biological Chemistry.

[59]  Steven Skiena,et al.  Analysis Techniques for Microarray Time-Series Data , 2002, J. Comput. Biol..

[60]  Satoru Miyano,et al.  Estimation of Genetic Networks and Functional Structures Between Genes by Using Bayesian Networks and Nonparametric Regression , 2001, Pacific Symposium on Biocomputing.

[61]  L. Glass,et al.  Stable oscillations in mathematical models of biological control systems , 1978 .

[62]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.