The LeFE algorithm: embracing the complexity of gene expression in the interpretation of microarray data

Interpretation of microarray data remains a challenge, and most methods fail to consider the complex, nonlinear regulation of gene expression. To address that limitation, we introduce Learner of Functional Enrichment (LeFE), a statistical/machine learning algorithm based on Random Forest, and demonstrate it on several diverse datasets: smoker/never smoker, breast cancer classification, and cancer drug sensitivity. We also compare it with previously published algorithms, including Gene Set Enrichment Analysis. LeFE regularly identifies statistically significant functional themes consistent with known biology.

[1]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[2]  R. Weigel,et al.  PDZK1 and GREB1 are estrogen-regulated genes expressed in hormone-responsive breast cancer. , 2000, Cancer research.

[3]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[4]  Gang Liu,et al.  Effects of cigarette smoke on the human airway epithelial cell transcriptome. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  C. Townsend,et al.  Synthesis and antitumor activity of an inhibitor of fatty acid synthase. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M. Kuwano,et al.  Sensitivity to gefitinib (Iressa, ZD1839) in non-small cell lung cancer cell lines correlates with dependence on the epidermal growth factor (EGF) receptor/extracellular signal-regulated kinase 1/2 and EGF receptor/Akt pathway for proliferation. , 2004, Molecular cancer therapeutics.

[7]  Abbreviations , 1971 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Daniel L. Hartl,et al.  GeneMerge - Post-genomic Analysis, Data Mining, and Hypothesis Testing , 2003, Bioinform..

[10]  Barry Komm,et al.  Profiling of estrogen up- and down-regulated gene expression in human breast cancer cells: insights into gene networks and pathways underlying estrogenic control of proliferation and cell phenotype. , 2003, Endocrinology.

[11]  W. MacNee,et al.  Lung glutathione and oxidative stress: implications in cigarette smoke-induced airway disease. , 1999, American journal of physiology. Lung cellular and molecular physiology.

[12]  J. Wiemels,et al.  Modulation of the toxicity and macromolecular binding of benzene metabolites by NAD(P)H:Quinone oxidoreductase in transfected HL-60 cells. , 1999, Chemical research in toxicology.

[13]  B. Pugh,et al.  Control of gene expression through regulation of the TATA-binding protein. , 2000, Gene.

[14]  John N. Weinstein,et al.  High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID) , 2005, BMC Bioinformatics.

[15]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[17]  David Cameron,et al.  Identification of molecular apocrine breast tumours by microarray analysis , 2005, Oncogene.

[18]  M M Epperlein,et al.  Effect of cigarette smoking on cultured human endothelial cells. , 1993, Cardiovascular research.

[19]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[20]  A. Hartmann,et al.  (www.interscience.wiley.com) DOI: 10.1002/path.2039 , 2006 .

[21]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[22]  K. Nishio,et al.  Gefitinib treatment affects androgen levels in non-small-cell lung cancer patients , 2005, British Journal of Cancer.

[23]  N. Altorki,et al.  Tobacco smoke induces CYP1B1 in the aerodigestive tract. , 2004, Carcinogenesis.

[24]  E. Wit Design and Analysis of DNA Microarray Investigations , 2004, Human Genomics.

[25]  Jun-yan Hong,et al.  Human cytochrome P450 CYP2A13: predominant expression in the respiratory tract and its high efficiency metabolic activation of a tobacco-specific carcinogen, 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone. , 2000, Cancer research.

[26]  J. Behr,et al.  Glutamate-cysteine ligase modulatory subunit in BAL alveolar macrophages of healthy smokers , 2003, European Respiratory Journal.

[27]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[28]  T. Barrette,et al.  ONCOMINE: a cancer microarray database and integrated data-mining platform. , 2004, Neoplasia.

[29]  References , 1971 .

[30]  Kevin P Cross,et al.  Comparison of methods for sequential screening of large compound sets. , 2006, Combinatorial chemistry & high throughput screening.

[31]  Hongyu Zhao,et al.  Pathway analysis using random forests classification and regression , 2006, Bioinform..

[32]  J. Richardson,et al.  Biphasic induction of immediate early gene expression accompanies activity-dependent angiogenesis and myofiber remodeling of rabbit skeletal muscle. , 1994, The Journal of clinical investigation.

[33]  Steve Horvath,et al.  Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma , 2005, Modern Pathology.

[34]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[35]  N. Baker,et al.  The EGF Receptor Defines Domains of Cell Cycle Progression and Survival to Regulate Cell Number in the Developing Drosophila Eye , 2001, Cell.

[36]  G. Goss,et al.  Strategies to Enhance Epidermal Growth Factor Inhibition: Targeting the Mevalonate Pathway , 2006, Clinical Cancer Research.

[37]  Nadarajah Vigneswaran,et al.  Cigarette smoke condensate induces cytochromes P450 and aldo-keto reductases in oral cancer cells. , 2006, Toxicology letters.

[38]  C. Guillemette,et al.  Tobacco carcinogen-detoxifying enzyme UGT1A7 and its association with orolaryngeal cancer risk. , 2001, Journal of the National Cancer Institute.

[39]  D. Lohr,et al.  Transcriptional regulation in the yeast GAL gene family: a complex genetic network , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[40]  O. Hankinson,et al.  CYP1A1 levels in lung tissue of tobacco smokers and polymorphisms of CYP1A1 and aromatic hydrocarbon receptor. , 2001, Pharmacogenetics.

[41]  P. Khatri,et al.  Profiling gene expression using onto-express. , 2002, Genomics.

[42]  T. Ayoubi,et al.  Regulation of gene expression by alternative promoters , 1996, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[43]  Chan Zeng,et al.  Baseline Gene Expression Predicts Sensitivity to Gefitinib in Non–Small Cell Lung Cancer Cell Lines , 2006, Molecular Cancer Research.

[44]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[45]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[46]  M. Kanehisa A database for post-genome analysis. , 1997, Trends in genetics : TIG.

[47]  Arvind K Pandey,et al.  Coregulation of estrogen receptor by ERBB4/HER4 establishes a growth-promoting autocrine signal in breast tumor cells. , 2006, Cancer research.

[48]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.