Analyzing Feature Importance for Metabolomics Using Genetic Programming

The emerging and fast-developing field of metabolomics examines the abundance of small-molecule metabolites in body fluids to study the cellular processes related to how the human body responds to genetic and environmental perturbations. Considering the complexity of metabolism, metabolites and their represented cellular processes can correlate and synergistically contribute to a phenotypic status. Genetic programming (GP) provides advanced analytical instruments for the investigation of multifactorial causes of metabolic diseases. In this article, we analyzed a population-based metabolomics dataset on osteoarthritis (OA) and developed a Linear GP (LGP) algorithm to search classification models that can best predict the disease outcome, as well as to identify the most important metabolic markers associated with the disease. The LGP algorithm was able to evolve prediction models with high accuracies especially with a more focused search using a reduced feature set that only includes potentially relevant metabolites. We also identified a set of key metabolic markers that may improve our understanding of the biochemistry and pathogenesis of the disease.

[1]  L. Samavedham,et al.  Genetic programming-based approach to elucidate biochemical interaction networks from data. , 2013, IET systems biology.

[2]  F. Bruggeman,et al.  The nature of systems biology. , 2007, Trends in microbiology.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  A. Cole,et al.  Detection of nitrotyrosine in aging and osteoarthritic cartilage: Correlation of oxidative damage with the presence of interleukin-1beta and with chondrocyte resistance to insulin-like growth factor 1. , 2002, Arthritis and rheumatism.

[5]  E. Kunkel Systems biology in drug discovery , 2004, Nature Biotechnology.

[6]  J. Buckwalter,et al.  The burden of musculoskeletal conditions at the start of the new millennium , 2003 .

[7]  Inyoul Lee,et al.  Systems Biology and the Discovery of Diagnostic Biomarkers , 2010, Disease markers.

[8]  G. Zhai,et al.  Metabolomic analysis of human synovial fluid and plasma reveals that phosphatidylcholine metabolism is associated with both osteoarthritis and diabetes mellitus , 2016, Metabolomics.

[9]  H. Kitano,et al.  Computational systems biology , 2002, Nature.

[10]  Guang Sun,et al.  Metabolomics Differential Correlation Network Analysis of Osteoarthritis , 2016, PSB.

[11]  M. Vidal,et al.  Interactome: gateway into systems biology. , 2005, Human molecular genetics.

[12]  Arpit A. Almal,et al.  Applications of genetic programming in cancer research. , 2009, The international journal of biochemistry & cell biology.

[13]  D. Raftery,et al.  Metabolomics-based methods for early disease diagnostics , 2008, Expert review of molecular diagnostics.

[14]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[15]  Masaru Tomita,et al.  Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis , 2012, Current bioinformatics.

[16]  Aytac Guven,et al.  Linear genetic programming for time-series modelling of daily flow rate , 2009 .

[17]  Wolfgang Banzhaf,et al.  A comparison of linear genetic programming and neural networks in medical data mining , 2001, IEEE Trans. Evol. Comput..

[18]  Malcolm I. Heywood,et al.  A Linear Genetic Programming Approach to Intrusion Detection , 2003, GECCO.

[19]  T. Stone,et al.  Kynurenine and neopterin levels in patients with rheumatoid arthritis and osteoporosis during drug treatment. , 2003, Advances in experimental medicine and biology.

[20]  T. Ideker,et al.  A new approach to decoding life: systems biology. , 2001, Annual review of genomics and human genetics.

[21]  S. Maśliński,et al.  Impaired generation of taurine chloramine by synovial fluid neutrophils of rheumatoid arthritis patients , 2002, Amino Acids.

[22]  E. Kontny,et al.  Taurine and inflammatory diseases , 2012, Amino Acids.

[23]  W. Murphy,et al.  The American College of Rheumatology criteria for the classification and reporting of osteoarthritis of the hand. , 1990, Arthritis and rheumatism.

[24]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[25]  G. Zhai,et al.  Lysophosphatidylcholines to phosphatidylcholines ratio predicts advanced knee osteoarthritis. , 2016, Rheumatology.

[26]  J. Reginster,et al.  The prevalence and burden of arthritis. , 2002, Rheumatology.

[27]  S. Shim Cell imaging: An intracellular dance visualized , 2017, Nature.

[28]  G. Zhai,et al.  Classification of osteoarthritis phenotypes by metabolomics analysis , 2014, BMJ Open.

[29]  Guangju Zhai,et al.  Serum branched-chain amino acid to histidine ratio: a novel metabolomic biomarker of knee osteoarthritis , 2010, Annals of the rheumatic diseases.

[30]  G. Zhai,et al.  Metabolomic analysis of human plasma reveals that arginine is depleted in knee osteoarthritis patients. , 2016, Osteoarthritis and cartilage.

[31]  R. Loeser Aging and osteoarthritis: the role of chondrocyte senescence and aging changes in the cartilage matrix. , 2009, Osteoarthritis and cartilage.

[32]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[33]  Luciano Milanesi,et al.  Multi-Level Data Integration and Data Mining in Systems Biology , 2009 .

[34]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[35]  G. Zhai,et al.  Attempt to replicate the published osteoarthritis-associated genetic variants in the Newfoundland & Labrador Population , 2014 .

[36]  Yixue Li,et al.  Big Biological Data: Challenges and Opportunities , 2014, Genom. Proteom. Bioinform..

[37]  Proton Rahman,et al.  Relationship Between Blood Plasma and Synovial Fluid Metabolite Concentrations in Patients with Osteoarthritis , 2015, The Journal of Rheumatology.