Analysis of Microarray Gene Expression Data

Microarrays provide the biological research community with tremendously rich, sensitive and detailed information on gene expression profiles. Gene expression profiling and gene expression patterns have been found useful for solving a wide variety of important biological and biomedical problems, including the study of metabolic pathways, inference of the functions of unknown genes, diagnosis of diseased states, as well as facilitating the development of individualized drug treatments through pharmacogenomics. Given the significant impact of microarray gene expression data in biological and biomedical research, this breakthrough technology urgently needs the assistance of advanced computational methods for interpreting and utilizing the raw information. This paper reviews several main research directions and methods in the analysis of microarray gene expression data.

[1]  Jan Havrda,et al.  Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.

[2]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[5]  Comparative study of several distortion measures for speech recognition , 1985, Speech Commun..

[6]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[8]  R. Bilonick An Introduction to Applied Geostatistics , 1989 .

[9]  Michael Edward Hohn,et al.  An Introduction to Applied Geostatistics: by Edward H. Isaaks and R. Mohan Srivastava, 1989, Oxford University Press, New York, 561 p., ISBN 0-19-505012-6, ISBN 0-19-505013-4 (paperback), $55.00 cloth, $35.00 paper (US) , 1991 .

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[13]  Rolf Adams,et al.  Seeded Region Growing , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Gregory D. Schuler,et al.  ESTablishing a human transcript map , 1995, Nature Genetics.

[15]  John G. Proakis,et al.  Digital signal processing using MATLAB〓 V.4 , 1997 .

[16]  K. Matsushima,et al.  Human cytomegalovirus induces interleukin-8 production by a human monocytic cell line, THP-1, through acting concurrently on AP-1- and NF-kappaB-binding sites of the interleukin-8 gene , 1997, Journal of virology.

[17]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[18]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[19]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[20]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[21]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[22]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[23]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[24]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[25]  Joachim Theilhaber,et al.  Bayesian estimation of fold-changes in gene expression: the PFOLD algorithm and its uses in the analysis of complex expression profiles , 1999, Nature Genetics.

[26]  Jeremy Buhler,et al.  Dapple: Improved Techniques for Finding Spots on DNA Microarrays , 2000 .

[27]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[28]  A. Brazma,et al.  Gene expression data analysis. , 2001, FEBS letters.

[29]  M. Soares,et al.  Sexually dimorphic expression of protease nexin-1 and vanin-1 in the developing mouse gonad prior to overt differentiation suggests a role in mammalian sexual development. , 2000, Human molecular genetics.

[30]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[31]  K. Matsushima,et al.  Comprehensive gene expression profile of LPS-stimulated human monocytes by SAGE. , 2000, Blood.

[32]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[33]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[34]  John Quackenbush,et al.  The TIGR Gene Indices: reconstruction and representation of expressed gene sequences , 2000, Nucleic Acids Res..

[35]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[36]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[37]  Hans Lehrach,et al.  Automated image analysis for array hybridization experiments , 2001, Bioinform..

[38]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[39]  J. Welsh,et al.  Molecular classification of human carcinomas by use of gene expression signatures. , 2001, Cancer research.

[40]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[41]  Rainer Fuchs,et al.  Bayesian Estimation of Fold-Changes in the Analysis of Gene Expression: The PFOLD Algorithm , 2001, J. Comput. Biol..

[42]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[43]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[45]  Gary D. Stormo,et al.  Selection of optimal DNA oligos for gene expression arrays , 2001, Bioinform..

[46]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[47]  Steven Skiena,et al.  Analysis techniques for microarray time-series data , 2001, RECOMB.

[48]  G. Pertea,et al.  RESOURCERER: a database for annotating and linking microarray resources within and across species , 2001, Genome Biology.

[49]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[51]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[52]  Ajay N. Jain,et al.  Fully automatic quantification of microarray image data. , 2002, Genome research.

[53]  Charles L. Kooperberg,et al.  Improved Background Correction for Spotted DNA Microarrays , 2002, J. Comput. Biol..

[54]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[55]  Yoshihide Hayashizaki,et al.  READ: RIKEN Expression Array Database , 2002, Nucleic Acids Res..

[56]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[57]  Junior Barrera,et al.  Segmentation of Microarray Images by Mathematical Morphology , 2002, Real Time Imaging.

[58]  Jörg Rahnenführer,et al.  Unsupervised technique for robust target separation and analysis of DNA microarray spots through adaptive pixel clustering , 2002, Bioinform..

[59]  Steven Skiena,et al.  Analysis Techniques for Microarray Time-Series Data , 2002, J. Comput. Biol..

[60]  J. Gough The SUPERFAMILY database in structural genomics. , 2002, Acta crystallographica. Section D, Biological crystallography.

[61]  G. Rubin,et al.  Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Hwan-Gue Cho,et al.  An automatic block and spot indexing with k-nearest neighbors graph for microarray image analysis , 2002, ECCB.

[63]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[64]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[65]  Minoru Kanehisa,et al.  The KEGG database. , 2002, Novartis Foundation symposium.

[66]  John Quackenbush,et al.  Genesis: cluster analysis of microarray data , 2002, Bioinform..

[67]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[68]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[69]  Stephen R Quake,et al.  Significance and statistical errors in the analysis of DNA microarray data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[70]  Roberto Marcondes Cesar Junior,et al.  Inference from Clustering with Application to Gene-Expression Microarrays , 2002, J. Comput. Biol..

[71]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[72]  Christoph Gille,et al.  Oligodb-interactive design of oligo DNA for transcription profiling of human genes , 2002, Bioinform..

[73]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[74]  Stefano Toppo,et al.  Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification. , 2003, Human molecular genetics.

[75]  Zheng Yuan,et al.  The mouse secretome: functional classification of the proteins secreted into the extracellular environment. , 2003, Genome research.

[76]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[77]  Jesús Angulo,et al.  Automatic analysis of DNA microarray images using mathematical morphology , 2003, Bioinform..

[78]  Melissa J. Davis,et al.  Mouse proteome analysis. , 2003, Genome research.

[79]  W. Huber,et al.  Analysis of microarray gene expression data , 2003 .

[80]  Hong Yan,et al.  Robust adaptive spot segmentation of DNA microarray images , 2003, Pattern Recognit..

[81]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[82]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[83]  Edison T Liu,et al.  Classification of cancers by expression profiling. , 2003, Current opinion in genetics & development.

[84]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[85]  Sayan Mukherjee,et al.  Classifying Microarray Data Using Support Vector Machines , 2003 .

[86]  Wei Pan,et al.  A mixture model approach to detecting differentially expressed genes with microarray data , 2003, Functional & Integrative Genomics.

[87]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[88]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[89]  Paul Denny,et al.  A comprehensive transcript map of the mouse Gnas imprinted complex. , 2003, Genome research.

[90]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[91]  Chris A. Glasbey,et al.  Combinatorial image analysis of DNA microarray features , 2003, Bioinform..

[92]  R. Teasdale,et al.  Intracellular sorting and transport of proteins. , 2003, Progress in biophysics and molecular biology.

[93]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[94]  P Kellam,et al.  Experimental use of DNA arrays , 2003 .

[95]  David Botstein,et al.  SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data , 2003, Nucleic Acids Res..

[96]  Tala Bakheet,et al.  p38 Mitogen-Activated Protein Kinase-Dependent and -Independent Signaling of mRNA Stability of AU-Rich Element-Containing Transcripts , 2003, Molecular and Cellular Biology.

[97]  Ash A. Alizadeh,et al.  Gene Expression Signature of Fibroblast Serum Response Predicts Human Cancer Progression: Similarities between Tumors and Wounds , 2004, PLoS biology.

[98]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[99]  Robert E. W. Hancock,et al.  ProbeLynx: a tool for updating the association of microarray probes to genes , 2004, Nucleic Acids Res..

[100]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[101]  Stephen T. C. Wong,et al.  Cancer classification and prediction using logistic regression with Bayesian gene selection , 2004, J. Biomed. Informatics.

[102]  B. Frey,et al.  The functional landscape of mouse gene expression , 2004, Journal of biology.

[103]  Peter J Park,et al.  Improving identification of differentially expressed genes in microarray studies using information from public databases , 2004, Genome Biology.

[104]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A dynamical model with adaptive pixel moving for microarray images segmentation , 2004, Real Time Imaging.

[105]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[106]  Yu Luo,et al.  Gridding and compression of microarray images , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[107]  Philippe Rigault,et al.  A novel, high-performance random array platform for quantitative gene expression profiling. , 2004, Genome research.

[108]  Z. Szallasi,et al.  Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements. , 2004, Nucleic acids research.

[109]  Tao Jiang,et al.  Minimum entropy clustering and applications to gene expression analysis , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[110]  Hong Yan,et al.  Dominant spectral component analysis for transcriptional regulations using microarray time-series data , 2004, Bioinform..

[111]  Boris Lenhard,et al.  RNAdb—a comprehensive mammalian noncoding RNA database , 2004, Nucleic Acids Res..

[112]  Chunlei Wu,et al.  Sequence dependence of cross-hybridization on short oligo microarrays , 2005, Nucleic acids research.

[113]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[114]  Robert Fredriksson,et al.  Evaluation of EST-data using the genome assembly. , 2005, Biochemical and biophysical research communications.

[115]  Tuan D. Pham,et al.  An Optimally Weighted Fuzzy k-NN Algorithm , 2005, ICAPR.

[116]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[117]  Kathleen F. Kerr,et al.  Standardizing global gene expression analysis between laboratories and across platforms , 2005, Nature Methods.

[118]  Reda Alhajj,et al.  Finding differentially expressed genes for pattern generation , 2005, Bioinform..

[119]  S. Batalov,et al.  Antisense Transcription in the Mammalian Transcriptome , 2005, Science.

[120]  Musa H. Asyali,et al.  Reliability analysis of microarray data using fuzzy c-means and normal mixture modeling based classification methods , 2005, Bioinform..

[121]  Integration of fuzzy and geostatistical models for estimating missing multivariate observations , 2005 .

[122]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..