A Self-supervised Learning Framework for Classifying Microarray Gene Expression Data

It is important to develop computational methods that can effectively resolve two intrinsic problems in microarray data: high dimensionality and small sample size. In this paper, we propose a self-supervised learning framework for classifying microarray gene expression data using Kernel Discriminant-EM (KDEM) algorithm. This framework applies self-supervised learning techniques in an optimal nonlinear discriminating subspace. It efficiently utilizes a large set of unlabeled data to compensate for the insufficiency of a small set of labeled data and it extends linear algorithm in DEM to kernel algorithm to handle nonlinearly separable data in a lower dimensional space. Extensive experiments on the Plasmodium falciparum expression profiles show the promising performance of the approach.

[1]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[2]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Jonathan E. Allen,et al.  Genome sequence of the human malaria parasite Plasmodium falciparum , 2002, Nature.

[4]  Yufeng Wang,et al.  Data-mining approaches reveal hidden families of proteases in the genome of malaria parasite. , 2003, Genome research.

[5]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[6]  Satoshi Omura,et al.  Proteasome Inhibitors Block Development ofPlasmodium spp , 1998, Antimicrobial Agents and Chemotherapy.

[7]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[8]  D. Eisenberg,et al.  Use of Logic Relationships to Decipher Protein Network Organization , 2004, Science.

[9]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[10]  Alexander J. Smola,et al.  Learning with kernels , 1998 .