Interactive Semisupervised Learning for Microarray Analysis

Microarray technology has generated vast amounts of gene expression data with distinct patterns. Based on the premise that genes of correlated functions tend to exhibit similar expression patterns, various machine learning methods have been applied to capture these specific patterns in microarray data. However, the discrepancy between the rich expression profiles and the limited knowledge of gene functions has been a major hurdle to the understanding of cellular networks. To bridge this gap so as to properly comprehend and interpret expression data, we introduce relevance feedback to microarray analysis and propose an interactive learning framework to incorporate the expert knowledge into the decision module. In order to find a good learning method and solve two intrinsic problems in microarray data, high dimensionality and small sample size, we also propose a semisupervised learning algorithm: kernel discriminant-EM (KDEM). This algorithm efficiently utilizes a large set of unlabeled data to compensate for the insufficiency of a small set of labeled data and it extends the linear algorithm in discriminant-EM (DEM) to a kernel algorithm to handle nonlinearly separable data in a lower dimensional space. The relevance feedback technique and KDEM together construct an efficient and effective interactive semisupervised learning framework for microarray analysis. Extensive experiments on the yeast cell cycle regulation data set and Plasmodium falciparum red blood cell cycle data set show the promise of this approach

[1]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[2]  Vojislav Kecman,et al.  Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised, and Unsupervised Learning , 2006, Studies in Computational Intelligence.

[3]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[4]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[5]  D. Eisenberg,et al.  Use of Logic Relationships to Decipher Protein Network Organization , 2004, Science.

[6]  Nicu Sebe,et al.  A new analysis of the value of unlabeled data in semi-supervised learning for image retrieval , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[7]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[8]  Gunnar Rätsch,et al.  Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Yufeng Wang,et al.  Data-mining approaches reveal hidden families of proteases in the genome of malaria parasite. , 2003, Genome research.

[10]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[11]  Yasufumi Takama,et al.  Genetic algorithm-based relevance feedback for image retrieval using local similarity patterns , 2003, Inf. Process. Manag..

[12]  See-Kiong Ng,et al.  On combining multiple microarray studies for improved functional classification by whole-dataset feature selection. , 2003, Genome informatics. International Conference on Genome Informatics.

[13]  M. Gerstein,et al.  Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. , 2002, Genome research.

[14]  Jonathan E. Allen,et al.  Genome sequence of the human malaria parasite Plasmodium falciparum , 2002, Nature.

[15]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[16]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[17]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[18]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[19]  Thomas S. Huang,et al.  Small sample learning during multimedia retrieval using BiasMap , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[20]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[21]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[22]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[23]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[24]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[25]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[26]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  Thomas S. Huang,et al.  Optimizing learning in image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[29]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[30]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[31]  Erkki Oja,et al.  PicSOM: self-organizing maps for content-based image retrieval , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[32]  W. Eric L. Grimson,et al.  A framework for learning query concepts in image classification , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[33]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Satoshi Omura,et al.  Proteasome Inhibitors Block Development ofPlasmodium spp , 1998, Antimicrobial Agents and Chemotherapy.

[35]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[36]  Christos Faloutsos,et al.  MindReader: Querying Databases Through Multiple Examples , 1998, VLDB.

[37]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[38]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[39]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[40]  Tom Minka,et al.  Modeling user subjectivity in image libraries , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[41]  Toshikazu Kato,et al.  Learning of personal visual impression for image database systems , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[42]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .