Prediction of K562 cells functional inhibitors based on machine learning approaches.

β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. It was determined that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

[1]  Kuo-Chen Chou,et al.  pLoc_bal-mGpos: Predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. , 2019, Genomics.

[2]  K. Chou,et al.  iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. , 2017, Genomics.

[3]  Willie J.G.M. Peijnenburg,et al.  Development of nanostructure–activity relationships assisting the nanomaterial hazard categorization for risk assessment and regulatory decision-making , 2016 .

[4]  Ervin Sejdic,et al.  Machine-Learning Identification of the Sensing Descriptors Relevant in Molecular Interactions with Metal Nanoparticle-Decorated Nanotube Field-Effect Transistors. , 2018, ACS applied materials & interfaces.

[5]  Bing Niu,et al.  Application of Machine Learning Approaches for Protein-protein Interactions Prediction. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[6]  Kuo-Chen Chou,et al.  pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. , 2017, Genomics.

[7]  K. Héberger,et al.  Intercorrelation Limits in Molecular Descriptor Preselection for QSAR/QSPR , 2019, Molecular informatics.

[8]  Margaret A Keller,et al.  Use of hydroxyurea and recombinant erythropoietin in management of homozygous beta0 thalassemia. , 2002, Journal of pediatric hematology/oncology.

[9]  Qin Chen,et al.  2D-SAR, Topomer CoMFA and molecular docking studies on avian influenza neuraminidase inhibitors , 2018, Computational and structural biotechnology journal.

[10]  Pravin Ambure,et al.  QSAR-Co: An Open Source Software for Developing Robust Multitasking or Multitarget Classification-Based QSAR Models , 2019, J. Chem. Inf. Model..

[11]  K. Chou,et al.  pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. , 2018, Genomics.

[12]  OEzkan Akin,et al.  Benchmarking Classification Models for Cell Viability on Novel Cancer Image Datasets , 2019, Current Bioinformatics.

[13]  Yovani Marrero-Ponce,et al.  Enhancing Acute Oral Toxicity Predictions by using Consensus Modeling and Algebraic Form-Based 0D-to-2D Molecular Encodes. , 2019, Chemical research in toxicology.

[14]  Abhishek Bhola and Shailendra Singh,et al.  Gene Selection Using High Dimensional Gene Expression Data: An Appraisal , 2016 .

[15]  Frank Noé,et al.  Learning Continuous and Data-Driven Molecular Descriptors by Translating Equivalent Chemical Representations , 2018 .

[16]  Qin Chen,et al.  Risk analysis of African swine fever in Poland based on spatio-temporal pattern and Latin hypercube sampling, 2014–2017 , 2019, BMC Veterinary Research.

[17]  Y-H Taguchi,et al.  Genetic Association between Amyotrophic Lateral Sclerosis and Cancer , 2017, Genes.

[18]  Sankalp Jain,et al.  Comparing the performance of meta-classifiers—a case study on selected imbalanced data sets relevant for prediction of liver toxicity , 2018, Journal of Computer-Aided Molecular Design.

[19]  Ling Tang,et al.  Prediction of an Interaction between Bakuchiol and Acetylcholinesterase using Adaboost , 2016 .

[20]  Ramida Watanapokasin,et al.  Hydroxyurea responses and fetal hemoglobin induction in β-thalassemia/HbE patients’ peripheral blood erythroid cell culture , 2006, Annals of Hematology.

[21]  Yoan Martínez-López,et al.  Higher-Order and Mixed Discrete Derivatives like a Novel Graph-Theoretical Invariant for Generating New Molecular Descriptors. , 2019, Current topics in medicinal chemistry.

[22]  Hiromasa Kaneko,et al.  Data Visualization, Regression, Applicability Domains and Inverse Analysis Based on Generative Topographic Mapping , 2018, Molecular informatics.

[23]  Subhash C Basak Editor's Perspective: Molecular Descriptor Landscape in the Twenty First Century and its Proper Use for Computer-Aided Drug Design. , 2019, Current computer-aided drug design.

[24]  Giannis Tzimas,et al.  Updates of the HbVar database of human hemoglobin variants and thalassemia mutations , 2013, Nucleic Acids Res..

[25]  A. Nienhuis,et al.  Hydroxyurea-induced HbF production in anemic primates: augmentation by erythropoietin, hematopoietic growth factors, and sodium butyrate. , 1992, Experimental hematology.

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Kuo-Chen Chou,et al.  An Epidemic Avian Influenza Prediction Model Based on Google Trends , 2019, Letters in Organic Chemistry.

[28]  Guohua Huang,et al.  The Advances and Challenges of Deep Learning Application in Biological Big Data Processing , 2017, Current Bioinformatics.

[29]  Eslam Pourbasheer,et al.  A comparative QSAR study of aryl-substituted isobenzofuran-1(3H)-ones inhibitors , 2017 .

[30]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[31]  Ivan S. Cole,et al.  3D-QSAR for binding constants of β-cyclodextrin host-guest complexes by utilising spectrophores as molecular descriptors. , 2019, Chemosphere.

[32]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[33]  Zhaohui Qi,et al.  Recent Progress in Long Noncoding RNAs Prediction , 2017, Current Bioinformatics.

[34]  Shin Min Kang,et al.  On Molecular Descriptors of Carbon Nanocones , 2018, Biomolecules.

[35]  J. Old,et al.  Screening and genetic diagnosis of haemoglobin disorders. , 2003, Blood reviews.

[36]  Li Zhang,et al.  QSAR modelling study of the bioconcentration factor and toxicity of organic compounds to aquatic organisms using machine learning and ensemble methods. , 2019, Ecotoxicology and environmental safety.

[37]  Chidchanok Lursinsap,et al.  An Efficient Prediction of HPV Genotypes from Partial Coding Sequences by Chaos Game Representation and Fuzzy k-Nearest Neighbor Technique , 2017 .

[38]  Mohammed Bennamoun,et al.  ECMSRC: A Sparse Learning Approach for the Prediction of Extracellular Matrix Proteins , 2017 .

[39]  C. Lowrey,et al.  Induction of human fetal hemoglobin via the NRF2 antioxidant response signaling pathway. , 2011, Blood.

[40]  Yue Kong,et al.  QSAR models for predicting the bioactivity of Polo-like Kinase 1 inhibitors , 2017 .

[41]  Kimito Funatsu,et al.  Random Forest Model with Combined Features: A Practical Approach to Predict Liquid‐crystalline Property , 2018, Molecular informatics.

[42]  Vinicius Gonçalves Maltarollo,et al.  HQSAR and random forest-based QSAR models for anti-T. vaginalis activities of nitroimidazoles derivatives. , 2019, Journal of molecular graphics & modelling.

[43]  Stewart F Owen,et al.  The Use of Molecular Descriptors To Model Pharmaceutical Uptake by a Fish Primary Gill Cell Culture Epithelium , 2018, Environmental science & technology.

[44]  Qin Chen,et al.  2D-SAR and 3D-QSAR analyses for acetylcholinesterase inhibitors , 2017, Molecular Diversity.

[45]  Yan He,et al.  Classification of Small GTPases with Hybrid Protein Features and Advanced Machine Learning Techniques , 2017, Current Bioinformatics.

[46]  K. Chou,et al.  iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. , 2018, Genomics.

[47]  Donald E Mager,et al.  Machine Learning Models for the Prediction of Chemotherapy-Induced Peripheral Neuropathy , 2019, Pharmaceutical Research.

[48]  Wei Huang,et al.  Efficient and Long-Lived Room-Temperature Organic Phosphorescence: Theoretical Descriptors for Molecular Designs. , 2019, Journal of the American Chemical Society.

[49]  Jianxin Li,et al.  Analysis and Modeling for Big Data in Cancer Research , 2017, BioMed research international.

[50]  John A McLean,et al.  Untargeted Molecular Discovery in Primary Metabolism: Collision Cross Section as a Molecular Descriptor in Ion Mobility-Mass Spectrometry. , 2018, Analytical chemistry.

[51]  Mohammad Hossein Fazel Zarandi,et al.  Hybrid intelligent approach for diagnosis of the lung nodule from CT images using spatial kernelized fuzzy c-means and ensemble learning , 2018, Math. Comput. Simul..

[52]  Wen Li,et al.  Identification and Analysis of cancer diagnosis using probabilistic classification vector machines with feature selection , 2017 .

[53]  Yue Yu,et al.  In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods. , 2011, Chemosphere.

[54]  Meng Wang,et al.  Study of drug-drug combinations based on molecular descriptors and physicochemical properties. , 2016, Combinatorial chemistry & high throughput screening.

[55]  Kuo-Chen Chou,et al.  Small molecular floribundiquinone B derived from medicinal plants inhibits acetylcholinesterase activity , 2017, Oncotarget.

[56]  Jun Ding,et al.  Classification of bioaccumulative and non-bioaccumulative chemicals using statistical learning approaches , 2008, Molecular Diversity.

[57]  Yi Lu,et al.  Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs. , 2019, Current drug targets.

[58]  Peng-Bo Zhang,et al.  A Novel AdaBoost Framework With Robust Threshold and Structural Optimization , 2018, IEEE Transactions on Cybernetics.

[59]  Min Wang,et al.  Prediction of antibacterial compounds by machine learning approaches , 2009, J. Comput. Chem..

[60]  Alessandro Erto,et al.  A quantitative prediction of the viscosity of amine based DESs using Sσ-profile molecular descriptors , 2019, Journal of Molecular Structure.

[61]  B. Niu,et al.  2D-QSAR and 3D-QSAR Analyses for EGFR Inhibitors , 2017, BioMed research international.

[62]  Sivaraj Rajappan and DeviPriya Rangasamy Adaptive Genetic Algorithm with Exploration-Exploitation Tradeoff for Preprocessing Microarray Datasets , 2017 .

[63]  Md. Nurul Haque Mollah,et al.  A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis , 2018, Current Bioinformatics.

[64]  Debanjan Mitra,et al.  Regioselective Synthesis, Molecular Descriptors of (1,5‐Disubstituted 1,2,3‐Triazolyl)Coumarin/Quinolone Derivatives and Their Docking Studies against Cancer Targets , 2019, ChemistrySelect.

[65]  A-Xing Zhu,et al.  Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China. , 2018, The Science of the total environment.

[66]  O. Witt,et al.  Induction of fetal hemoglobin expression by the histone deacetylase inhibitor apicidin. , 2003, Blood.

[67]  George Papadatos,et al.  Optimal Piecewise Linear Regression Algorithm for QSAR Modelling , 2018, Molecular informatics.

[68]  Robert Ancuceanu,et al.  Development of QSAR machine learning-based models to forecast the effect of substances on malignant melanoma cells , 2019, Oncology letters.

[69]  H Jouault,et al.  Modulation of erythrocyte potassium chloride cotransport, potassium content, and density by dietary magnesium intake in transgenic SAD mouse. , 1996, Blood.

[70]  David C Rees,et al.  Treatment of thalassaemia major with phenylbutyrate and hydroxyurea , 1997, The Lancet.