A Directed Acyclic Graphical Approach and Ensemble Feature Selection for a Better Drug Development Strategy Using Partial Knowledge from KEGG Signalling Pathways

In this paper we consider the application of machine learning of graphical models and feature selection for developing better drug-design strategies. The work discussed in this paper is based on utilizing partial prior knowledge available through KEGG signalling pathway database in tan dim with our recent developed ensemble feature selection methods for a better regularisation of the lasso estimate. This work adds an extra layer of previously unseen knowledge in KEGG signalling pathways that embodies infering the underlying connectivity between gene-families implicated in breast cancer in MAPK-signalling pathway in response to application of anti-cancer drugs "neoadjuvant docetaxel".

[1]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[2]  Adel Aloraini Ensemble Feature Selection Methods for a Better Regularization of the Lasso Estimate in P >> N Gene Expression Datasets , 2013, 2013 12th International Conference on Machine Learning and Applications.

[3]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[4]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[5]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[6]  David Page,et al.  Using Machine Learning to Design and Interpret Gene-Expression Microarrays , 2004, AI Mag..

[7]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[8]  Adel Abdullah M. Aloraini,et al.  Extending the Graphical Representation of four KEGG Pathways for a Better Understanding of Prostate Cancer Using Machine Learning of Graphical models , 2011 .

[9]  Weixiong Zhang,et al.  A general co-expression network-based approach to gene expression analysis: comparison and applications , 2010, BMC Systems Biology.

[10]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[11]  D. Lashkari,et al.  High-throughput genomic and proteomic analysis using microarray technology. , 2001, Clinical chemistry.

[12]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[13]  R. Stolzenberg,et al.  Multiple Regression Analysis , 2004 .

[14]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[15]  P. Bork,et al.  Drug discovery in the age of systems biology: the rise of computational approaches for data integration. , 2012, Current opinion in biotechnology.

[16]  David R Westhead,et al.  The transcriptional regulation of protein complexes; a cross-species perspective. , 2009, Genomics.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Syed Mohsin,et al.  Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer , 2003, The Lancet.

[19]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[20]  Diane Gershon,et al.  Microarray technology: An array of opportunities , 2002, Nature.

[21]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[22]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[23]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[24]  Jianzhi Zhang Evolution by gene duplication: an update , 2003 .

[25]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[26]  M. Fireman,et al.  MULTIPLE REGRESSION ANALYSIS OF SOIL DATA , 1954 .