Interactive Semisupervised Learning for Microarray Analysis

Microarray technology has generated vast amounts of gene expression data with distinct patterns. Based on the premise that genes of correlated functions tend to exhibit similar expression patterns, various machine learning methods have been applied to capture these specific patterns in microarray data. However, the discrepancy between the rich expression profiles and the limited knowledge of gene functions has been a major hurdle to the understanding of cellular networks. To bridge this gap so as to properly comprehend and interpret expression data, we introduce relevance feedback to microarray analysis and propose an interactive learning framework to incorporate the expert knowledge into the decision module. In order to find a good learning method and solve two intrinsic problems in microarray data, high dimensionality and small sample size, we also propose a semisupervised learning algorithm: kernel discriminant-EM (KDEM). This algorithm efficiently utilizes a large set of unlabeled data to compensate for the insufficiency of a small set of labeled data and it extends the linear algorithm in discriminant-EM (DEM) to a kernel algorithm to handle nonlinearly separable data in a lower dimensional space. The relevance feedback technique and KDEM together construct an efficient and effective interactive semisupervised learning framework for microarray analysis. Extensive experiments on the yeast cell cycle regulation data set and Plasmodium falciparum red blood cell cycle data set show the promise of this approach

[1]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[4]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[5]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[6]  Satoshi Omura,et al.  Proteasome Inhibitors Block Development ofPlasmodium spp , 1998, Antimicrobial Agents and Chemotherapy.

[7]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[8]  Christos Faloutsos,et al.  MindReader: Querying Databases Through Multiple Examples , 1998, VLDB.

[9]  Vojislav Kecman,et al.  Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised, and Unsupervised Learning , 2006, Studies in Computational Intelligence.

[10]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[11]  W. Eric L. Grimson,et al.  A framework for learning query concepts in image classification , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[12]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[13]  Thomas S. Huang,et al.  Small sample learning during multimedia retrieval using BiasMap , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[14]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[15]  See-Kiong Ng,et al.  On combining multiple microarray studies for improved functional classification by whole-dataset feature selection. , 2003, Genome informatics. International Conference on Genome Informatics.

[16]  Yufeng Wang,et al.  Data-mining approaches reveal hidden families of proteases in the genome of malaria parasite. , 2003, Genome research.

[17]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[18]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[19]  D. Eisenberg,et al.  Use of Logic Relationships to Decipher Protein Network Organization , 2004, Science.

[20]  M. Gerstein,et al.  Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. , 2002, Genome research.

[21]  Toshikazu Kato,et al.  Learning of personal visual impression for image database systems , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[22]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[23]  Yasufumi Takama,et al.  Genetic algorithm-based relevance feedback for image retrieval using local similarity patterns , 2003, Inf. Process. Manag..

[24]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[25]  Jonathan E. Allen,et al.  Genome sequence of the human malaria parasite Plasmodium falciparum , 2002, Nature.

[26]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[28]  Gunnar Rätsch,et al.  Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[30]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[31]  Nicu Sebe,et al.  A new analysis of the value of unlabeled data in semi-supervised learning for image retrieval , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[32]  Erkki Oja,et al.  PicSOM: self-organizing maps for content-based image retrieval , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[33]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[34]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[35]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[36]  Tom Minka,et al.  Modeling user subjectivity in image libraries , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[37]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[38]  Thomas S. Huang,et al.  Optimizing learning in image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[39]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[40]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[42]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[43]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.