Pareto-Optimal Methods for Gene Ranking

The massive scale and variability of microarray gene data creates new and challenging problems of signal extraction, gene clustering, and data mining, especially for temporal gene profiles. Many data mining methods for finding interesting gene expression patterns are based on thresholding single discriminants, e.g. the ratio of between-class to within-class variation or correlation to a template. Here a different approach is introduced for extracting information from gene microarrays. The approach is based on multiple objective optimization and we call it Pareto front analysis (PFA). This method establishes a ranking of genes according to estimated probabilities that each gene is Pareto-optimal, i.e., that it lies on the Pareto front of the multiple objective scattergram. Both a model-driven Bayesian Pareto method and a data-driven non-parametric Pareto method, based on rank-order statistics, are presented. The methods are illustrated for two gene microarray experiments.

[1]  K. Kadota,et al.  Preprocessing implementation for microarray (PRIM): an efficient method for processing cDNA microarray data. , 2001, Physiological genomics.

[2]  A. R. Jonckheere,et al.  A DISTRIBUTION-FREE k-SAMPLE TEST AGAINST ORDERED ALTERNATIVES , 1954 .

[3]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[6]  Harold R. Lindman Analysis of Variance in Experimental Design , 1991 .

[7]  M. Eisen,et al.  Gene expression informatics —it's all in your mine , 1999, Nature Genetics.

[8]  Alfred O. Hero,et al.  Clustering gene expression signals from retinal microarray data , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  W. C. Guenther,et al.  Analysis of variance , 1968, The Mathematical Gazette.

[11]  Marc Sobel On selecting the Pareto-optimal subset of a class of populations , 1992 .

[12]  Alfred O. Hero,et al.  Multicriteria Gene Screening for Microarray Experiments , .

[13]  Ralph E. Steuer Multiple criteria optimization , 1986 .

[14]  N. L. Johnson,et al.  Linear Statistical Inference and Its Applications , 1966 .

[15]  J. Dennis,et al.  A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems , 1997 .

[16]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[17]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[18]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[19]  Trevor Hastie,et al.  Gene Shaving: a new class of clustering methods for expression arrays , 2000 .

[20]  S. Geisser,et al.  Posterior Distributions for Multivariate Normal Parameters , 1963 .

[21]  R. S. Laundy,et al.  Multiple Criteria Optimisation: Theory, Computation and Application , 1989 .

[22]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[23]  W. Stadler Multicriteria Optimization in Engineering and in the Sciences , 1988 .

[24]  C. K. Lee,et al.  Gene expression profile of aging and its retardation by caloric restriction. , 1999, Science.

[25]  J. Fitch,et al.  Genomic engineering: moving beyond DNA sequence to function , 2000, Proceedings of the IEEE.

[26]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[27]  K. Arrow,et al.  Social Choice and Multicriterion Decision-Making , 1986 .

[28]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[29]  G. Church,et al.  Microarray analysis of the transcriptional network controlled by the photoreceptor homeobox gene Crx , 2000, Current Biology.

[30]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[31]  D. Donoho,et al.  Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , 1992 .

[32]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[33]  William J. Wilson,et al.  Multivariate Statistical Methods , 2005, Technometrics.

[34]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Nanxiang Ge,et al.  PIDEX : a Statistical Approach for Screening Differentially Expressed Genes Using Microarray Analysis , 2001 .

[36]  O HeroAlfred,et al.  Multicriteria gene screening for analysis of differential expression with DNA microarrays , 2004 .

[37]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[38]  Alfred O. Hero,et al.  Pareto analysis for gene filtering in microarray experiments , 2002, 2002 11th European Signal Processing Conference.