Bayesian Network Learning with Feature Abstraction for Gene-drug Dependency Analysis

Combined analysis of the microarray and drug-activity datasets has the potential of revealing valuable knowledge about various relations among gene expressions and drug activities in the malignant cell. In this paper, we apply Bayesian networks, a tool for compact representation of the joint probability distribution, to such analysis. For the alleviation of data dimensionality problem, the huge datasets were condensed using a feature abstraction technique. The proposed analysis method was applied to the NCI60 dataset (http://discover.nci.nih.gov) consisting of gene expression profiles and drug activity patterns on human cancer cell lines. The Bayesian networks, learned from the condensed dataset, identified most of the salient pairwise correlations and some known relationships among several features in the original dataset, confirming the effectiveness of the proposed feature abstraction method. Also, a survey of the recent literature confirms the several relationships appearing in the learned Bayesian network to be biologically meaningful.

[1]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[2]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[3]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[4]  M. Chun Plasmin induces the formation of multicellular spheroids of breast cancer cells. , 1997, Cancer letters.

[5]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[6]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[7]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[9]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.

[10]  M. Lindström,et al.  p14ARF homozygous deletion or MDM2 overexpression in Burkitt lymphoma lines carrying wild type p53 , 2001, Oncogene.

[11]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[12]  A. Wolffe,et al.  Selective association of the methyl-CpG binding protein MBD2 with the silent p14/p16 locus in human neoplasia , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[14]  F. Aoudjit,et al.  Integrin signaling inhibits paclitaxel-induced apoptosis in breast cancer cells , 2001, Oncogene.

[15]  Rainer Fuchs,et al.  Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters , 2001, Bioinform..

[16]  Tommi S. Jaakkola,et al.  Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models , 2001, Pacific Symposium on Biocomputing.

[17]  Naftali Tishby,et al.  Unsupervised document classification using sequential information maximization , 2002, SIGIR '02.

[18]  Hiroyuki Toh,et al.  Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling , 2002, Bioinform..

[19]  Byoung-Tak Zhang,et al.  Analysis of Gene Expression Profiles and Drug Activity Patterns by Clustering and Bayesian Network Learning , 2002 .

[20]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[21]  Satoru Miyano,et al.  Use of Gene Networks for Identifying and Validating Drug Targets , 2003, J. Bioinform. Comput. Biol..

[22]  Satoru Miyano,et al.  Bayesian Network and Nonparametric Heteroscedastic Regression for Nonlinear Modeling of Genetic Network , 2003, J. Bioinform. Comput. Biol..

[23]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[24]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[25]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.