A trend pattern assessment approach to microarray gene expression profiling data analysis

We study the problem of how to assess the reliability of a statistical measurement on data set containing unknown quantity of noises, inconsistencies, and outliers. A practical approach that analyzes the dynamical patterns (trends) of the statistical measurements through a sequential extreme-boundary-points (EBP) weed-out process is explored. We categorize the weed-out trend patterns (WOTP) and examine their relation to the reliability of the measurement. The approach is applied to the processes of extracting genes that are predictive to BCL2 translocations and to clinical survival outcomes of diffuse large B-cell lymphoma (DLBCL) from DNA Microarray gene expression profiling data sets. Fisher's Discriminate Criterion (FDC) is used as a statistical measurement in the processes. It is found that the weed-out trend analysis (WOTA) approach is effective for qualitatively assessing the statistics-based measurements in the experimentations conducted.

[1]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[2]  Qiuming Zhu,et al.  Algorithmic fusion of gene expression profiling for diffuse large B-cell lymphoma outcome prediction , 2004, IEEE Transactions on Information Technology in Biomedicine.

[3]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[4]  Jian Yang,et al.  What's wrong with Fisher criterion? , 2002, Pattern Recognit..

[5]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[6]  H. Saluz,et al.  Fundamentals of DNA-chip/array technology for comparative gene-expression analysis , 2002 .

[7]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[8]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[9]  G. Maddala,et al.  10 Outliers, unit roots and robust estimation of nonstationary time series , 1997 .

[10]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[11]  Kajia Cao,et al.  BCL2 translocation defines a unique tumor subset within the germinal center B-cell-like diffuse large B-cell lymphoma. , 2004, The American journal of pathology.

[12]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[13]  Nir Friedman,et al.  Scoring Genes for Relevance , 2000 .

[14]  Meland,et al.  THE USE OF MOLECULAR PROFILING TO PREDICT SURVIVAL AFTER CHEMOTHERAPY FOR DIFFUSE LARGE-B-CELL LYMPHOMA , 2002 .

[15]  Ash A. Alizadeh,et al.  Genomic-scale gene expression profiling of normal and malignant immune cells. , 2000, Current opinion in immunology.

[16]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  F. Bertucci,et al.  Expression profiling: DNA arrays in many guises. , 1999, BioEssays : news and reviews in molecular, cellular and developmental biology.

[18]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[19]  M. J. Bayarri,et al.  Calibration of ρ Values for Testing Precise Null Hypotheses , 2001 .

[20]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[21]  ShimKyuseok,et al.  Efficient algorithms for mining outliers from large data sets , 2000 .

[22]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[23]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[24]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .