Identifying cancer targets based on machine learning methods via Chou's 5-steps rule and general pseudo components.

In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of 'big data' derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifiers.

[1]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[2]  Liangliang Kong,et al.  Architecture of the Mitochondrial Calcium Uniporter , 2016, Nature.

[3]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[4]  Graham R. Ball,et al.  Identification of gene transcript signatures predictive for estrogen receptor and lymph node status using a stepwise forward selection artificial neural network modelling approach , 2008, Artif. Intell. Medicine.

[5]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[6]  Guo-Ping Zhou,et al.  3D structural conformation and functional domains of polysialyltransferase ST8Sia IV required for polysialylation of neural cell adhesion molecules. , 2015, Protein and peptide letters.

[7]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[8]  Kuo-Chen Chou,et al.  pLoc_bal-mPlant: Predict Subcellular Localization of Plant Proteins by General PseAAC and Balancing Training Dataset. , 2018, Current pharmaceutical design.

[9]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[10]  Guo-Ping Zhou,et al.  The Intrinsic Relationship Between Structure and Function of the Sialyltransferase ST8Sia Family Members. , 2017, Current topics in medicinal chemistry.

[11]  Stein Aerts,et al.  Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models , 2015, PLoS Comput. Biol..

[12]  Q. Zou,et al.  Cancer Diagnosis Through IsomiR Expression with Machine Learning Method , 2016 .

[13]  E. Mardis,et al.  A Surprising Cross-Species Conservation in the Genomic Landscape of Mouse and Human Oral Cancer Identifies a Transcriptional Signature Predicting Metastatic Disease , 2014, Clinical Cancer Research.

[14]  Daniel Sinnett,et al.  SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing , 2016, BMC Genomics.

[15]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[16]  Wei Chen,et al.  iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences , 2016, Oncotarget.

[17]  Daniel F. Leite,et al.  Evolving granular neural networks from fuzzy data streams , 2013, Neural Networks.

[18]  K. Chou,et al.  iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[19]  Stephen T. C. Wong,et al.  Prognostic Gene Discovery in Glioblastoma Patients using Deep Learning , 2019, Cancers.

[20]  Kuo-Chen Chou,et al.  iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition , 2017, Oncotarget.

[21]  Kuo-Chen Chou,et al.  Insights from Modeling the 3D Structure of DNA−CBF3b Complex , 2005 .

[22]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[23]  Heng Zhang,et al.  A three microRNA‐based prognostic signature for small cell lung cancer overall survival , 2018, Journal of cellular biochemistry.

[24]  Kuo-Chen Chou,et al.  Prediction of the Tertiary Structure of the β-Secretase Zymogen☆ , 2002 .

[25]  S Vinitha Sree,et al.  Diagnosis of Hashimoto’s thyroiditis in ultrasound using tissue characterization and pixel classification , 2013, Proceedings of the Institution of Mechanical Engineers. Part H, Journal of engineering in medicine.

[26]  Ad Bax,et al.  Solution structure of Ca2+–calmodulin reveals flexible hand-like properties of its domains , 2001, Nature Structural Biology.

[27]  Dong Chen,et al.  The Inhibition of Polysialyltranseferase ST8SiaIV through Heparin binding to Polysialyltransferase Domain (PSTD). , 2019, Medicinal chemistry (Shariqah (United Arab Emirates)).

[28]  Kuo-Chen Chou,et al.  2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function , 2017, Molecular therapy. Nucleic acids.

[29]  HaiXia Long,et al.  Deep Convolutional Neural Networks for Predicting Hydroxyproline in Proteins , 2017 .

[30]  C. Fu,et al.  Latent factor analysis facilitates modelling of oncogenic genes for colon adenocarcinoma. , 2013, IET systems biology.

[31]  Mukhtaj Khan,et al.  Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou's PseKNC. , 2018, Journal of theoretical biology.

[32]  Jinhua Yu,et al.  Breast Tumor Classification Based on a Computerized Breast Imaging Reporting and Data System Feature System , 2018, Journal of ultrasound in medicine : official journal of the American Institute of Ultrasound in Medicine.

[33]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[34]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[35]  Kuo-Chen Chou,et al.  Modeling the tertiary structure of human cathepsin-E. , 2005, Biochemical and biophysical research communications.

[36]  Kuo-Chen Chou,et al.  QuatIdent: a web server for identifying protein quaternary structural attribute by fusing functional domain and sequential evolution information. , 2009, Journal of proteome research.

[37]  K. Chou Advance in predicting subcellular localization of multi-label proteins and its implication for developing multi-target drugs. , 2019, Current medicinal chemistry.

[38]  K. Chou,et al.  Study of drug resistance of chicken influenza A virus (H5N1) from homology-modeled 3D structures of neuraminidases. , 2007, Biochemical and biophysical research communications.

[39]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[40]  Qian-zhong Li,et al.  Gene expression classification using epigenetic features and DNA sequence composition in the human embryonic stem cell line H1. , 2016, Gene.

[41]  James J. Chou,et al.  Stability and Water Accessibility of the Trimeric Membrane Anchors of the HIV-1 Envelope Spikes. , 2017, Journal of the American Chemical Society.

[42]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[43]  Anil Rai,et al.  Statistical Approaches for Gene Selection, Hub Gene Identification and Module Interaction in Gene Co-Expression Network Analysis: An Application to Aluminum Stress in Soybean (Glycine max L.) , 2017, PloS one.

[44]  Dexing Zhong,et al.  Genome-wide identification and predictive modeling of lincRNAs polyadenylation in cancer genome , 2014, Comput. Biol. Chem..

[45]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[46]  Doaa M. Shawky and Ahmed F. Seddik On the Temporal Effects of Features on the Prediction of Breast Cancer Survivability , 2017 .

[47]  Kuo-Chen Chou,et al.  A Novel Modeling in Mathematical Biology for Classification of Signal Peptides , 2018, Scientific Reports.

[48]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[49]  Pritish Kumar Varadwaj,et al.  DeepInteract: Deep Neural Network Based Protein-Protein Interaction Prediction Tool , 2017 .

[50]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[51]  K. Chou,et al.  The biological functions of low‐frequency vibrations (phonons). VI. A possible dynamic mechanism of allosteric transition in antibody molecules , 1987, Biopolymers.

[52]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[53]  K. Chou,et al.  iACP: a sequence-based tool for identifying anticancer peptides , 2016, Oncotarget.

[54]  Hong Gu,et al.  Predicting lysine phosphoglycerylation with fuzzy SVM by incorporating k-spaced amino acid pairs into Chou׳s general PseAAC. , 2016, Journal of theoretical biology.

[55]  Hao Wang,et al.  The Recent Applications and Developments of Bioinformatics and Omics Technologies in Traditional Chinese Medicine , 2019, Current Bioinformatics.

[56]  Kuo-Chen Chou,et al.  iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. , 2016, Analytical biochemistry.

[57]  Yuehjen E. Shao,et al.  Integrated Use of Statistical-Based Approaches and Computational Intelligence Techniques for Tumors Classification Using Microarray , 2015 .

[58]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[59]  Prabuddha Sengupta,et al.  Structural Basis and Functional Role of Intramembrane Trimerization of the Fas/CD95 Death Receptor. , 2016, Molecular cell.

[60]  Yu-Dong Cai,et al.  Prediction of protein-peptide interaction with nearest neighbor algorithm , 1969 .

[61]  Dong Chen,et al.  Recent Progresses in Studying Helix-Helix Interactions in Proteins by Incorporating the Wenxiang Diagram into the NMR Spectroscopy. , 2016, Current topics in medicinal chemistry.

[62]  Wei Chen,et al.  iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. , 2014, Analytical biochemistry.

[63]  K. Chou,et al.  Novel Inhibitor Design for Hemagglutinin against H1N1 Influenza Virus by Core Hopping Method , 2011, PloS one.

[64]  V. Govorun,et al.  LogLoss-BERAF: An ensemble-based machine learning model for constructing highly accurate diagnostic sets of methylation sites accounting for heterogeneity in prostate cancer , 2018, PloS one.

[65]  B. Niu,et al.  Studies on the Interaction between Three Small Flavonoid Molecules and Bovine Lactoferrin , 2018, BioMed research international.

[66]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[67]  Xiaoqi Ma,et al.  Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA , 2015, BioMed research international.

[68]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[69]  Kannan Arputharaj,et al.  A Discrete Wavelet Based Feature Extraction and Hybrid Classification Technique for Microarray Data Analysis , 2014, TheScientificWorldJournal.

[70]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[71]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[72]  K. Chou Applications of graph theory to enzyme kinetics and protein folding kinetics. Steady and non-steady-state systems. , 2020, Biophysical chemistry.

[73]  Dong Xu,et al.  iPhos‐PseEvo: Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into General PseAAC via Grey System Theory , 2017, Molecular informatics.

[74]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[75]  Jun Wang,et al.  SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data , 2013, Genome Biology.

[76]  J. Chou,et al.  Unusual architecture of the p7 channel from hepatitis C virus , 2013, Nature.

[77]  K. Chou,et al.  iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. , 2018, Genomics.

[78]  Huimin Lu,et al.  Automatic identification of circulating tumor cells in fluorescence microscopy images based on AdaBoost , 2017, 2017 17th International Conference on Control, Automation and Systems (ICCAS).

[79]  Michele A. Busby,et al.  Supplementary Materials for Deep learning using tumor HLA peptide mass spectrometry datasets improves neoantigen identification , 2018 .

[80]  Kuo-Chen Chou,et al.  SPrenylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-prenylation sites in proteins. , 2019, Journal of theoretical biology.

[81]  Ren Long,et al.  iRSpot-EL: identify recombination spots with an ensemble learning approach , 2017, Bioinform..

[82]  Ren Long,et al.  iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework , 2016, Bioinform..

[83]  Kuo-Chen Chou,et al.  SPalmitoylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-palmitoylation sites in proteins. , 2019, Analytical biochemistry.

[84]  K. Chou,et al.  The critical spherical shell in enzymatic fast reaction systems. , 1980, Biophysical chemistry.

[85]  Guangpeng Li,et al.  PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition , 2017, Bioinform..

[86]  Xi Chen,et al.  Computational identification of human long intergenic non-coding RNAs using a GA-SVM algorithm. , 2014, Gene.

[87]  Sang Won Yoon,et al.  A support vector machine-based ensemble algorithm for breast cancer diagnosis , 2017, Eur. J. Oper. Res..

[88]  J. Chou,et al.  Structure and mechanism of the M2 proton channel of influenza A virus , 2008, Nature.

[89]  Kuo-Chen Chou,et al.  iPPI-PseAAC(CGR): Identify protein-protein interactions by incorporating chaos game representation into PseAAC. , 2019, Journal of theoretical biology.

[90]  Bor-Wen Cheng,et al.  Diagnosing Breast Masses in Digital Mammography Using Feature Selection and Ensemble Methods , 2012, Journal of Medical Systems.

[91]  Dong Xu,et al.  Classification of lung cancer using ensemble-based feature selection and machine learning methods. , 2015, Molecular bioSystems.

[92]  S. Harrison,et al.  Mitochondrial uncoupling protein 2 structure determined by NMR molecular fragment searching , 2011, Nature.

[93]  K. Chou,et al.  iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC , 2016, Oncotarget.

[94]  Kuo-Chen Chou,et al.  Recent advances in predicting protein classification and their applications to drug development. , 2013, Current topics in medicinal chemistry.

[95]  Turki Turki,et al.  An empirical study of machine learning algorithms for cancer identification , 2018, 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC).

[96]  Sara Mariani,et al.  EEG segmentation for improving automatic CAP detection , 2013, Clinical Neurophysiology.

[97]  Masahiro Okamoto,et al.  Application of bioinformatics for DNA microarray data to bioscience, bioengineering and medical fields. , 2006, Journal of bioscience and bioengineering.

[98]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[99]  Ney Lemke,et al.  Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review , 2016, Front. Physiol..

[100]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[101]  J. Chou,et al.  Ion and inhibitor binding of the double-ring ion selectivity filter of the mitochondrial calcium uniporter , 2017, Proceedings of the National Academy of Sciences.

[102]  Bin Zhang,et al.  Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma. , 2017, Cancer letters.

[103]  Jiu-Xin Tan,et al.  Identification of hormone binding proteins based on machine learning methods. , 2019, Mathematical biosciences and engineering : MBE.

[104]  Hong Yan,et al.  Prediction of Protein-Protein Interactions Based on Molecular Interface Features and the Support Vector Machine , 2013 .

[105]  Alireza Rezazadeh,et al.  Artificial neural network training using a new efficient optimization algorithm , 2013, Appl. Soft Comput..

[106]  Kuo-Chen Chou,et al.  pLoc_bal‐mAnimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC , 2018, Bioinform..

[107]  Yu-Ping Wang,et al.  MicroRNA–mRNA interaction analysis to detect potential dysregulation in complex diseases , 2014, Network Modeling Analysis in Health Informatics and Bioinformatics.

[108]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[109]  B. Liu,et al.  Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. , 2015, Journal of theoretical biology.

[110]  J. Chou,et al.  Solution structure and functional analysis of the influenza B proton channel , 2009, Nature Structural &Molecular Biology.

[111]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[112]  Ruijun Zhang,et al.  Fu-SulfPred: Identification of Protein S-sulfenylation Sites by Fusing Forests via Chou's General PseAAC. , 2019, Journal of theoretical biology.

[113]  A. Vellaichamy,et al.  Sequence and structure‐based characterization of ubiquitination sites in human and yeast proteins using Chou's sample formulation , 2019, Proteins.

[114]  J. Chou,et al.  Substrate Modulated Dynamics of the ADP/ATP Transporter Revealed by NMR Relaxation Dispersion , 2015, Nature Structural &Molecular Biology.

[115]  Sameer Antani,et al.  An Observational Study of Deep Learning and Automated Evaluation of Cervical Images for Cancer Screening. , 2019, Journal of the National Cancer Institute.

[116]  Eslam Pourbasheer,et al.  An efficient piecewise linear model for predicting activity of caspase-3 inhibitors , 2012, DARU Journal of Pharmaceutical Sciences.

[117]  Kamal R. Pardasani,et al.  Fuzzy support vector machine model to predict human death domain protein–protein interactions , 2015, Network Modeling Analysis in Health Informatics and Bioinformatics.

[118]  K. Chou Graphic rules in steady and non-steady state enzyme kinetics. , 1989, The Journal of biological chemistry.

[119]  K. Chou,et al.  iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC , 2017, Molecular therapy. Nucleic acids.

[120]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[121]  Rong Chen,et al.  HBPred: a tool to identify growth hormone-binding proteins , 2018, International journal of biological sciences.

[122]  Yongchun Zuo,et al.  Function determinants of TET proteins: the arrangements of sequence motifs with specific codes , 2019, Briefings Bioinform..

[123]  Kuo-Chen Chou,et al.  pLoc_bal-mVirus: predict subcellular localization of multi-label virus proteins by PseAAC and IHTS treatment to balance training dataset. , 2018, Medicinal chemistry (Shariqah (United Arab Emirates)).

[124]  Kuo-Chen Chou,et al.  Prediction of Nitrosocysteine Sites Using Position and Composition Variant Features , 2019, Letters in Organic Chemistry.

[125]  Kuo-Chen Chou,et al.  pNitro-Tyr-PseAAC: Predict Nitrotyrosine Sites in Proteins by Incorporating Five Features into Chou's General PseAAC. , 2019, Current pharmaceutical design.

[126]  Linna Hou,et al.  System Dynamics Simulation of Large-Scale Generation System for Designing Wind Power Policy in China , 2015 .

[127]  Constance A. Sobsey,et al.  Detailed biophysical characterization of the acid-induced PrP(c) to PrP(β) conversion process. , 2011, Biochemistry.

[128]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[129]  Kuo-Chen Chou,et al.  iRNA-2methyl: Identify RNA 2'-O-methylation Sites by Incorporating Sequence-Coupled Effects into General PseKNC and Ensemble Classifier. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[130]  Michael S. Seaman,et al.  Structural basis for membrane anchoring of HIV-1 envelope spike , 2016, Science.

[131]  Stephanie E. Moser,et al.  Association of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative , 2017, bioRxiv.

[132]  Ignacio Rojas,et al.  Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series , 2018, PloS one.

[133]  Geoffrey I. Webb,et al.  MetalExplorer, a Bioinformatics Tool for the Improved Prediction of Eight Types of Metal-Binding Sites Using a Random Forest Algorithm with Two- Step Feature Selection , 2017 .

[134]  Kuo-Chen Chou,et al.  iPTM-mLys: identifying multiple lysine PTM sites and their different types , 2016, Bioinform..

[135]  Ping Zhang,et al.  Class-specific mutual information variation for feature selection , 2018, Pattern Recognit..

[136]  K. Chou,et al.  iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. , 2015, Analytical biochemistry.

[137]  Geoffrey I. Webb,et al.  Positive-unlabelled learning of glycosylation sites in the human proteome , 2019, BMC Bioinformatics.

[138]  Wen Li,et al.  Identification and Analysis of cancer diagnosis using probabilistic classification vector machines with feature selection , 2017 .

[139]  Michael C. Rusch,et al.  CREST maps somatic structural variation in cancer genomes with base-pair resolution , 2011, Nature Methods.

[140]  Ney Lemke,et al.  Corrigendum: Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review , 2016, Front. Physiol..

[141]  Yaochu Jin,et al.  Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection , 2019, Appl. Soft Comput..

[142]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[143]  K. Chou,et al.  Insights from investigating the interaction of oseltamivir (Tamiflu) with neuraminidase of the 2009 H1N1 swine flu virus. , 2009, Biochemical and biophysical research communications.

[144]  Shuigeng Zhou,et al.  Predicting Enhancers from Multiple Cell Lines and Tissues across Different Developmental Stages Based On SVM Method , 2018, Current Bioinformatics.

[145]  Wei Chen,et al.  iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition , 2016, Oncotarget.

[146]  Xionghui Zhou,et al.  Integrating Feature Selection and Feature Extraction Methods With Deep Learning to Predict Clinical Outcome of Breast Cancer , 2018, IEEE Access.

[147]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[148]  Kuo-Chen Chou,et al.  Simulated Protein Thermal Detection (SPTD) for Enzyme Thermostability Study and an Application Example for Pullulanase from Bacillus deramificans. , 2018, Current pharmaceutical design.

[149]  Marco Beccuti,et al.  Large disclosing the nature of computational tools for the analysis of next generation sequencing data. , 2012, Current topics in medicinal chemistry.

[150]  Junbo Wang,et al.  A microfluidic system for cell type classification based on cellular size-independent electrical properties. , 2013, Lab on a chip.

[151]  Wendy S. W. Wong,et al.  Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs , 2012, Bioinform..

[152]  Xia Sun,et al.  Drug and Nondrug Classification Based on Deep Learning with Various Feature Selection Strategies , 2018 .

[153]  Zhongming Zhao,et al.  Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations , 2015, BioMed research international.

[154]  Kuo-Chen Chou,et al.  An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science. , 2017, Current topics in medicinal chemistry.

[155]  Pandia Rajan Jeyaraj,et al.  Computer-assisted medical image classification for early diagnosis of oral cancer employing deep learning algorithm , 2019, Journal of Cancer Research and Clinical Oncology.

[156]  Thomas Wetter,et al.  Gene Expression Profiling of Colorectal Tumors and Normal Mucosa by Microarrays Meta-Analysis Using Prediction Analysis of Microarray, Artificial Neural Network, Classification, and Regression Trees , 2014, Disease markers.

[157]  Cangzhi Jia,et al.  Prediction of Protein S-Nitrosylation Sites Based on Adapted Normal Distribution Bi-Profile Bayes and Chou’s Pseudo Amino Acid Composition , 2014, International journal of molecular sciences.

[158]  Roy D Sleator,et al.  Biologically inspired intelligent decision making , 2013, Bioengineered.

[159]  Kuo-Chen Chou,et al.  iPhosY-PseAAC: identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC , 2018, Molecular Biology Reports.

[160]  Fangfang Xia,et al.  Predicting tumor cell line response to drug pairs with deep learning , 2018, BMC Bioinformatics.

[161]  Yunlong Liu,et al.  Identification of genes and pathways involved in kidney renal clear cell carcinoma , 2014, BMC Bioinformatics.

[162]  K. Chou,et al.  Low-frequency collective motion in biomacromolecules and its biological functions. , 1988, Biophysical chemistry.

[163]  Hui Ding,et al.  iRNA(m6A)-PseDNC: Identifying N6-methyladenosine sites using pseudo dinucleotide composition. , 2018, Analytical biochemistry.

[164]  Kuo-Chen Chou,et al.  pLoc_bal-mGpos: Predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. , 2019, Genomics.

[165]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[166]  H.-B. Shen,et al.  Using ensemble classifier to identify membrane protein types , 2006, Amino Acids.

[167]  Liang Fu,et al.  Using ensemble SVM to identify human GPCRs N-linked glycosylation sites based on the general form of Chou's PseAAC. , 2013, Protein engineering, design & selection : PEDS.

[168]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[169]  Süleyman Cenk Sahinalp,et al.  deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data , 2011, PLoS Comput. Biol..

[170]  Kuo-Chen Chou,et al.  Coupling interaction between thromboxane A2 receptor and alpha-13 subunit of guanine nucleotide-binding protein. , 2005, Journal of proteome research.

[171]  J. Chou,et al.  The structural basis for intramembrane assembly of an activating immunoreceptor complex , 2010, Nature Immunology.

[172]  Dayou Liu,et al.  A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis , 2011, Expert Syst. Appl..

[173]  Kuo-Chen Chou,et al.  An Epidemic Avian Influenza Prediction Model Based on Google Trends , 2019, Letters in Organic Chemistry.

[174]  Guohua Huang,et al.  The Advances and Challenges of Deep Learning Application in Biological Big Data Processing , 2017, Current Bioinformatics.

[175]  M. Schatz,et al.  Searching for SNPs with cloud computing , 2009, Genome Biology.

[176]  Kuo-Chen Chou,et al.  iPhos-PseEn: Identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier , 2016, Oncotarget.

[177]  M. Arfan Jaffar Hybrid Texture based Classification of Breast Mammograms using Adaboost Classifier , 2017 .

[178]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[179]  K. Chou,et al.  iHyd-PseAAC: Predicting Hydroxyproline and Hydroxylysine in Proteins by Incorporating Dipeptide Position-Specific Propensity into Pseudo Amino Acid Composition , 2014, International journal of molecular sciences.

[180]  Xiang Cheng,et al.  iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach , 2015, Journal of biomolecular structure & dynamics.

[181]  Quan Zou Latest Machine Learning Techniques for Biomedicine and Bioinformatics , 2019 .

[182]  Kuo-Chen Chou,et al.  pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. , 2016, Journal of theoretical biology.

[183]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[184]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[185]  K. Chou,et al.  iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. , 2017, Genomics.

[186]  Tiejun Tong,et al.  Gene Selection Using Iterative Feature Elimination Random Forests for Survival Outcomes , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[187]  Kuo-Chen Chou,et al.  A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. , 2009, Analytical biochemistry.

[188]  Kuo-Chen Chou,et al.  pLoc_bal-mEuk: predict subcellular localization of eukaryotic proteins by general PseAAC and quasi-balancing training dataset. , 2019, Medicinal chemistry (Shariqah (United Arab Emirates)).

[189]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[190]  Ahmad Tavakoli,et al.  A novel model used to detect differential splice junctions as biomarkers in prostate cancer from RNA-Seq data , 2016, J. Biomed. Informatics.

[191]  Hao Wang,et al.  Pancreatic cancer biomarker detection using recursive feature elimination based on Support Vector Machine and large margin distribution machine , 2017, 2017 4th International Conference on Systems and Informatics (ICSAI).

[192]  J.J. Hopfield,et al.  Artificial neural networks , 1988, IEEE Circuits and Devices Magazine.

[193]  Kuo-Chen Chou,et al.  iPhosT-PseAAC: Identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. , 2018, Analytical biochemistry.

[194]  Vinod Kumar,et al.  Segmentation, Feature Extraction, and Multiclass Brain Tumor Classification , 2013, Journal of Digital Imaging.

[195]  Kuo-Chen Chou,et al.  iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC , 2016, Oncotarget.

[196]  K. Chou,et al.  iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[197]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[198]  Alan C. Rigby,et al.  Rapid and accurate structure determination of coiled‐coil domains using NMR dipolar couplings: Application to cGMP‐dependent protein kinase Iα , 2005 .

[199]  K. Chou Prediction of signal peptides using scaled window , 2001, Peptides.

[200]  K. Chou,et al.  iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model , 2015, Journal of biomolecular structure & dynamics.

[201]  S. Forsén,et al.  Diffusion-controlled effects in reversible enzymatic fast reaction systems--critical spherical shell and proximity rate constant. , 1980, Biophysical chemistry.

[202]  Hao Wu,et al.  Higher-Order Clustering of the Transmembrane Anchor of DR5 Drives Signaling , 2019, Cell.

[203]  S. Forsén,et al.  Graphical rules for enzyme-catalysed rate laws. , 1980, The Biochemical journal.

[204]  D. Dong,et al.  Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning , 2019, European Respiratory Journal.

[205]  G. Zhou,et al.  An extension of Chou's graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways. , 1984, The Biochemical journal.

[206]  Usman Qamar,et al.  WebMAC: A web based clinical expert system , 2018, Inf. Syst. Frontiers.

[207]  Xuelong Li,et al.  Feature selection with multi-view data: A survey , 2019, Inf. Fusion.

[208]  Mouayad Zarzar,et al.  DNA Microarray Gene Expression Analysis for Diagnosis of Oral Dysplasia and Squamous-Cell Carcinoma , 2015 .

[209]  Guy N. Brock,et al.  Empirical evaluation of consistency and accuracy of methods to detect differentially expressed genes based on microarray data , 2014, Comput. Biol. Medicine.

[210]  Jinshan Liu,et al.  Optimal gene subset selection using the modified SFFS algorithm for tumor classification , 2012, Neural Computing and Applications.

[211]  Lei Deng,et al.  Prediction of Protein S-Sulfenylation Sites Using a Deep Belief Network , 2018, Current Bioinformatics.

[212]  S Joshua Swamidass,et al.  A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data , 2018, Nature Genetics.

[213]  K.-C. Chou,et al.  Using string kernel to predict signal peptide cleavage site based on subsite coupling model , 2005, Amino Acids.

[214]  Chao Dai,et al.  An integrative modular approach to systematically predict gene-phenotype associations , 2010, BMC Bioinform..

[215]  Guo-Ping Zhou,et al.  The structural determinations of the leucine zipper coiled-coil domains of the cGMP-dependent protein kinase Iα and its interaction with the myosin binding subunit of the myosin light chains phosphase. , 2011, Protein and peptide letters.

[216]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[217]  Fan Yang,et al.  iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC , 2018, Bioinform..

[218]  Kang Li,et al.  Classification and Identification of Differential Gene Expression for Microarray Data: Improvement of the Random Forest Method , 2008, 2008 2nd International Conference on Bioinformatics and Biomedical Engineering.

[219]  Kuo-Chen Chou,et al.  pSSbond-PseAAC: Prediction of disulfide bonding sites by integration of PseAAC and statistical moments. , 2019, Journal of theoretical biology.

[220]  Kenji Doya,et al.  Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces , 2013, Front. Neurorobot..

[221]  K. Chou,et al.  iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. , 2013, Analytical biochemistry.

[222]  Sounak Chakraborty,et al.  Bayesian binary kernel probit model for microarray based cancer classification and gene selection , 2009, Comput. Stat. Data Anal..