Generation of Comprehensible Hypotheses from Gene Expression Data

Machine learning techniques have been recognized as powerful tools for the analysis of gene expression data. However, most learning techniques used in class prediction in gene expression analysis during the past years generate black-box models. Although the prediction accuracy of these models could be very well, they provide little insight into the biological facts. This paper holds the recognition that a more reasonable role for machine learning techniques is to generate hypotheses that can be verified or refined by human experts instead of making decisions for human experts. Based on this recognition, a general approach to generate comprehensible hypotheses from gene expression data is described and applied to human acute leukemias as a test case. The results demonstrate the feasibility of using machine learning techniques to help form hypotheses on the relationship between genes and certain diseases.

[1]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[2]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Sung-Bae Cho,et al.  Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features , 2002, Proc. IEEE.

[5]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[6]  E Mjolsness,et al.  Machine learning for science: state of the art and future prospects. , 2001, Science.

[7]  Zheng Yun,et al.  Identifying simple discriminatory gene vectors with an information theory approach , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[8]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[9]  Katsumi Yoshida,et al.  A comparison between two neural network rule extraction techniques for the diagnosis of hepatobiliary disorders , 2000, Artif. Intell. Medicine.

[10]  Lucila Ohno-Machado,et al.  An Epicurean learning approach to gene-expression data classification , 2003, Artif. Intell. Medicine.

[11]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[12]  Rudy Setiono,et al.  Generating concise and accurate classification rules for breast cancer diagnosis , 2000, Artif. Intell. Medicine.

[13]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[14]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.

[15]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[16]  Jinyan Li,et al.  Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns , 2002, Bioinform..

[17]  Zhi-Hua Zhou,et al.  Rule extraction: Using neural networks or for neural networks? , 2004, Journal of Computer Science and Technology.

[18]  J F Bishop,et al.  Adult acute myeloid leukaemia: update on treatment , 1999, The Medical journal of Australia.

[19]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[20]  Simon Lin,et al.  Methods of microarray data analysis III , 2002 .

[21]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[22]  Zhi-Hua Zhou,et al.  Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble , 2003, IEEE Transactions on Information Technology in Biomedicine.

[23]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[24]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[25]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[26]  Zhi-Hua Zhou,et al.  NeC4.5: neural ensemble based C4.5 , 2004, IEEE Transactions on Knowledge and Data Engineering.

[27]  N. Maughan,et al.  An introduction to arrays , 2001, The Journal of pathology.