The Characterization of Structure and Prediction for Aquaporin in Tumour Progression by Machine Learning

Recurrence and new cases of cancer constitute a challenging human health problem. Aquaporins (AQPs) can be expressed in many types of tumours, including the brain, breast, pancreas, colon, skin, ovaries, and lungs, and the histological grade of cancer is positively correlated with AQP expression. Therefore, the identification of aquaporins is an area to explore. Computational tools play an important role in aquaporin identification. In this research, we propose reliable, accurate and automated sequence predictor iAQPs-RF to identify AQPs. In this study, the feature extraction method was 188D (global protein sequence descriptor, GPSD). Six common classifiers, including random forest (RF), NaiveBayes (NB), support vector machine (SVM), XGBoost, logistic regression (LR) and decision tree (DT), were used for AQP classification. The classification results show that the random forest (RF) algorithm is the most suitable machine learning algorithm, and the accuracy was 97.689%. Analysis of Variance (ANOVA) was used to analyse these characteristics. Feature rank based on the ANOVA method and IFS strategy was applied to search for the optimal features. The classification results suggest that the 26th feature (neutral/hydrophobic) and 21st feature (hydrophobic) are the two most powerful and informative features that distinguish AQPs from non-AQPs. Previous studies reported that plasma membrane proteins have hydrophobic characteristics. Aquaporin subcellular localization prediction showed that all aquaporins were plasma membrane proteins with highly conserved transmembrane structures. In addition, the 3D structure of aquaporins was consistent with the localization results. Therefore, these studies confirmed that aquaporins possess hydrophobic properties. Although aquaporins are highly conserved transmembrane structures, the phylogenetic tree shows the diversity of aquaporins during evolution. The PCA showed that positive and negative samples were well separated by 54D features, indicating that the 54D feature can effectively classify aquaporins. The online prediction server is accessible at http://lab.malab.cn/∼acy/iAQP.

[1]  Xiang Wu,et al.  EPSOL: sequence-based protein solubility prediction using multidimensional embedding , 2021, Bioinform..

[2]  Zhaopeng Meng,et al.  Free-form tumor synthesis in computed tomography images via richer generative adversarial network , 2021, Knowl. Based Syst..

[3]  Hui Yang,et al.  Risk Prediction of Diabetes: Big data mining with fusion of multifarious physical examination indicators , 2021, Inf. Fusion.

[4]  Lin Gao,et al.  Predicting therapeutic drugs for hepatocellular carcinoma based on tissue-specific pathways , 2021, PLoS Comput. Biol..

[5]  Hao Lin,et al.  PPD: A Manually Curated Database for Experimentally Verified Prokaryotic Promoters. , 2021, Journal of molecular biology.

[6]  Ran Su,et al.  Classification and gene selection of triple-negative breast cancer subtype embedding gene connectivity matrix in deep neural network , 2021, Briefings Bioinform..

[7]  Dan Zhang,et al.  iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins , 2021, Comput. Math. Methods Medicine.

[8]  Xiangxiang Zeng,et al.  ITP-Pred: an interpretable method for predicting, therapeutic peptides with fused features low-dimension representation , 2020, Briefings Bioinform..

[9]  Jiangning Song,et al.  Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks , 2020, Briefings Bioinform..

[10]  Qinghua Jiang,et al.  Prediction of transcription factors binding events based on epigenetic modifications in different human cells. , 2020, Epigenomics.

[11]  Qingcai Chen,et al.  iDRBP_MMC: Identifying DNA-Binding Proteins and RNA-Binding Proteins Based on Multi-Label Learning Model and Motif-Based Convolutional Neural Network. , 2020, Journal of molecular biology.

[12]  Bin Liu,et al.  ProtFold-DFG: protein fold recognition by combining Directed Fusion Graph and PageRank algorithm , 2020, Briefings Bioinform..

[13]  Jijun Tang,et al.  Identification of Drug-Target Interactions via Dual Laplacian Regularized Least Squares with Multiple Kernel Fusion , 2020, Knowl. Based Syst..

[14]  Jijun Tang,et al.  DeepATT: a hybrid category attention neural network for identifying functional effects of DNA sequences , 2020, Briefings Bioinform..

[15]  Hao Lin,et al.  Predicting Preference of Transcription Factors for Methylated DNA Using Sequence Information , 2020, Molecular therapy. Nucleic acids.

[16]  X. Xiao,et al.  iPromoter-5mC: A Novel Fusion Decision Predictor for the Identification of 5-Methylcytosine Sites in Genome-Wide DNA Promoters , 2020, Frontiers in Cell and Developmental Biology.

[17]  Bin Liu,et al.  IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning , 2020, Bioinform..

[18]  Bin Liu,et al.  FoldRec-C2C: protein fold recognition by combining cluster-to-cluster model and protein similarity network , 2020, Briefings Bioinform..

[19]  J. Bowen,et al.  Combined Systematic Review and Transcriptomic Analyses of Mammalian Aquaporin Classes 1 to 10 as Biomarkers and Prognostic Indicators in Diverse Cancers , 2020, Cancers.

[20]  Quan Zou,et al.  Basic polar and hydrophobic properties are the main characteristics that affect the binding of transcription factors to methylation sites , 2020, Bioinform..

[21]  Fei Guo,et al.  MRMD2.0: A Python Tool for Machine Learning with Feature Ranking and Reduction , 2020, Current Bioinformatics.

[22]  Xiangxiang Zeng,et al.  Application of deep learning methods in biological networks , 2020, Briefings Bioinform..

[23]  Fu-Ying Dao,et al.  Computational identification of N6-methyladenosine sites in multiple tissues of mammals , 2020, Computational and structural biotechnology journal.

[24]  Le Zhang,et al.  A Review on the Methods of Peptide-MHC Binding Prediction , 2020 .

[25]  Chanin Nantasenamat,et al.  iBitter-SCM: Identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides. , 2020, Genomics.

[26]  Nalini Schaduangrat,et al.  HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation , 2020, Bioinform..

[27]  Hao Wang,et al.  Identification of membrane protein types via multivariate information fusion with Hilbert-Schmidt Independence Criterion , 2020, Neurocomputing.

[28]  Xiangxiang Zeng,et al.  StackCPPred: a stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency , 2020, Bioinform..

[29]  Fu-Ying Dao,et al.  A computational platform to identify origins of replication sites in eukaryotes , 2020, Briefings Bioinform..

[30]  Mingming Zhao,et al.  Exploiting XG Boost for Predicting Enhancer-promoter Interactions , 2020 .

[31]  T. Kwon,et al.  Exosomes co‐expressing AQP5‐targeting miRNAs and IL‐4 receptor‐binding peptide inhibit the migration of human breast cancer cells , 2020, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[32]  Bin Liu,et al.  Fold-LTR-TCP: protein fold recognition based on triadic closure principle , 2019, Briefings Bioinform..

[33]  Fei Guo,et al.  Identification of drug–target interactions via fuzzy bipartite local model , 2019, Neural Computing and Applications.

[34]  C. Delporte,et al.  Aquaporins Involvement in Pancreas Physiology and in Pancreatic Diseases , 2019, International journal of molecular sciences.

[35]  Xiangrong Liu,et al.  Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism , 2019, Bioinform..

[36]  Han Zhang,et al.  BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches , 2019, Nucleic acids research.

[37]  Yong Deng,et al.  Evidential Decision Tree Based on Belief Entropy , 2019, Entropy.

[38]  Xiaofeng Liu,et al.  Developing a Multi-Dose Computational Model for Drug-Induced Hepatotoxicity Prediction Based on Toxicogenomics Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  Hao Lin,et al.  Identifying Sigma70 Promoters with Novel Pseudo Nucleotide Composition , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Wei Chen,et al.  iProEP: A Computational Predictor for Predicting Promoter , 2019, Molecular therapy. Nucleic acids.

[41]  G. Nagaraju,et al.  Aquaporins in female specific cancers. , 2019, Gene.

[42]  Quan Zou,et al.  Incorporating Distance-based Top-n-gram and Random Forest to Identify Electron Transport Proteins. , 2019, Journal of proteome research.

[43]  Xiangrong Liu,et al.  deepDR: a network-based deep learning approach to in silico drug repositioning , 2019, Bioinform..

[44]  Leyi Wei,et al.  Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation , 2019, Molecular therapy. Nucleic acids.

[45]  Geoffrey I. Webb,et al.  iLearn : an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data , 2019, Briefings Bioinform..

[46]  Yoonsuh Jung,et al.  Transformed low-rank ANOVA models for high-dimensional variable selection , 2019, Statistical methods in medical research.

[47]  Swakkhar Shatabda,et al.  PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences , 2019, Bioinform..

[48]  Xinyi Liu,et al.  Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. , 2019, Methods.

[49]  Quan Zou,et al.  ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm. , 2019, Journal of proteome research.

[50]  Jie Hu,et al.  Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools , 2019, Briefings Bioinform..

[51]  Wei Chen,et al.  Predicting protein structural classes for low-similarity sequences by evaluating different features , 2019, Knowl. Based Syst..

[52]  Leyi Wei,et al.  mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation , 2018, Bioinform..

[53]  Hao Lv,et al.  Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique , 2018, Bioinform..

[54]  Jiangning Song,et al.  Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms , 2018, Briefings Bioinform..

[55]  Yan Lin,et al.  iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators , 2018, Bioinform..

[56]  Xingpeng Jiang,et al.  Sequence clustering in bioinformatics: an empirical study. , 2018, Briefings in bioinformatics.

[57]  Xiujun Gong,et al.  A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers , 2018, Genes.

[58]  Ran Su,et al.  M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning , 2018, Molecular therapy. Nucleic acids.

[59]  Jiangning Song,et al.  ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides , 2018, Bioinform..

[60]  Rong Chen,et al.  HBPred: a tool to identify growth hormone-binding proteins , 2018, International journal of biological sciences.

[61]  Michael L. De Ieso,et al.  Mechanisms of Aquaporin-Facilitated Cancer Invasion and Metastasis , 2018, Front. Chem..

[62]  Xiangxiang Zeng,et al.  Prediction of potential disease-associated microRNAs using structural perturbation method , 2017, bioRxiv.

[63]  R. Alarcón,et al.  Non‐normal data: Is ANOVA still a valid option? , 2017, Psicothema.

[64]  Leyi Wei,et al.  A novel hierarchical selective ensemble classifier with bioinformatics application , 2017, Artif. Intell. Medicine.

[65]  L. N. Nejsum,et al.  Aquaporin-3 in Cancer , 2017, International journal of molecular sciences.

[66]  A. von Haeseler,et al.  UFBoot2: Improving the Ultrafast Bootstrap Approximation , 2017, bioRxiv.

[67]  Xiangxiang Zeng,et al.  Prediction and Validation of Disease Genes Using HeteSim Scores , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[68]  Thomas K. F. Wong,et al.  ModelFinder: Fast Model Selection for Accurate Phylogenetic Estimates , 2017, Nature Methods.

[69]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..

[70]  Xiaochi Ma,et al.  The potential roles of aquaporin 4 in malignant gliomas , 2017, Oncotarget.

[71]  Fei Guo,et al.  Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier , 2017, Artif. Intell. Medicine.

[72]  T. Kwon,et al.  The role of aquaporin-5 in cancer cell migration: A potential active participant. , 2016, The international journal of biochemistry & cell biology.

[73]  Hua Tang,et al.  Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition , 2016, BioMed research international.

[74]  D. Iacopino,et al.  Aquaporins and Brain Tumors , 2016, International journal of molecular sciences.

[75]  A. Rawlings,et al.  Membrane proteins: always an insoluble problem? , 2016, Biochemical Society transactions.

[76]  R. Basha,et al.  Aquaporins: Their role in gastrointestinal malignancies. , 2016, Cancer letters.

[77]  I. Direito,et al.  Aquaporin-5: from structure to function and dysfunction in cancer , 2016, Cellular and Molecular Life Sciences.

[78]  B. Liu,et al.  iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition , 2014, PloS one.

[79]  T. Kwon,et al.  AQP5 Expression Predicts Survival in Patients with Early Breast Cancer , 2014, Annals of Surgical Oncology.

[80]  Bo Jiang,et al.  Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes , 2014, PloS one.

[81]  Yue Gao,et al.  Improved and promising identification of human MicroRNAs by incorporating a high-quality negative set , 2014, TCBB.

[82]  Mu-Kuan Chen,et al.  Kaempferol Reduces Matrix Metalloproteinase-2 Expression by Down-Regulating ERK1/2 and the Activator Protein-1 Signaling Pathways in Oral Cancer Cells , 2013, PloS one.

[83]  Kumardeep Chaudhary,et al.  In Silico Models for Designing and Discovering Novel Anticancer Peptides , 2013, Scientific Reports.

[84]  Wenliang Li,et al.  Knockdown a Water Channel Protein, Aquaporin-4, Induced Glioblastoma Cell Apoptosis , 2013, PloS one.

[85]  D. Panario,et al.  Handbook of Finite Fields , 2013, Discrete mathematics and its applications.

[86]  Minh Anh Nguyen,et al.  Ultrafast Approximation for Phylogenetic Bootstrap , 2013, Molecular biology and evolution.

[87]  C. Capurro,et al.  Aquaporin 2‐increased renal cell proliferation is associated with cell volume regulation , 2012, Journal of cellular biochemistry.

[88]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[89]  H. Jeon,et al.  Aquaporin-5: A Marker Protein for Proliferation and Migration of Human Breast Cancer Cells , 2011, PloS one.

[90]  A. Verkman,et al.  Upregulation of aquaporin-3 is involved in keratinocyte proliferation and epidermal hyperplasia. , 2011, The Journal of investigative dermatology.

[91]  B. Owler,et al.  Expression of AQP1 and AQP4 in paediatric brain tumours , 2011, Journal of Clinical Neuroscience.

[92]  Yadong Wang,et al.  Predicting human microRNA-disease associations based on support vector machine , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[93]  Liangchen Fu,et al.  Aquaporin-4 in glioma invasion and an analysis of molecular mechanisms , 2010, Journal of Clinical Neuroscience.

[94]  C. Bai,et al.  Expression of aquaporin 5 increases proliferation and metastasis potential of lung cancer , 2010, The Journal of pathology.

[95]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[96]  Lin Lu,et al.  A novel computational approach to predict transcription factor DNA binding preference. , 2009, Journal of proteome research.

[97]  C. Moon,et al.  Role of human aquaporin 5 in colorectal carcinogenesis. , 2008, The American journal of pathology.

[98]  C. Moon,et al.  Expression of Aquaporin 5 (AQP5) Promotes Tumor Invasion in Human Non Small Cell Lung Cancer , 2008, PloS one.

[99]  J. Frøkiaer,et al.  A current view of the mammalian aquaglyceroporins. , 2008, Annual review of physiology.

[100]  A. Verkman,et al.  Aquaporin-3 facilitates epidermal cell migration and proliferation during wound healing , 2008, Journal of Molecular Medicine.

[101]  Gajendra P. S. Raghava,et al.  Identification of DNA-binding proteins using support vector machines and evolutionary profiles , 2007, BMC Bioinformatics.

[102]  A. Verkman,et al.  Prevention of Skin Tumorigenesis and Impairment of Epidermal Cell Proliferation by Targeted Aquaporin-3 Gene Disruption , 2007, Molecular and Cellular Biology.

[103]  A. Warth,et al.  Expression pattern of the water channel aquaporin‐4 in human gliomas is associated with blood–brain barrier disturbance but not with patient survival , 2007, Journal of neuroscience research.

[104]  G. Manley,et al.  Greatly impaired migration of implanted aquaporin‐4‐deficient astroglial cells in mouse brain toward a site of injury , 2007, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[105]  A. Verkman,et al.  Aquaporin-3-dependent cell migration and proliferation during corneal re-epithelialization. , 2006, Investigative ophthalmology & visual science.

[106]  Jeffrey Skolnick,et al.  Efficient prediction of nucleic acid binding function from low-resolution protein structures. , 2006, Journal of molecular biology.

[107]  Geoffrey T. Manley,et al.  Involvement of aquaporin-4 in astroglial cell migration and glial scar formation , 2005, Journal of Cell Science.

[108]  A. Verkman,et al.  Aquaporin-1 facilitates epithelial cell migration in kidney proximal tubule. , 2005, Journal of the American Society of Nephrology : JASN.

[109]  N. Bhardwaj,et al.  Kernel-based machine learning protocol for predicting DNA-binding proteins , 2005, Nucleic acids research.

[110]  A. Verkman More than just water channels: unexpected cellular roles of aquaporins , 2005, Journal of Cell Science.

[111]  A. Mobasheri,et al.  Heterogeneous expression of the aquaporin 1 (AQP1) water channel in tumors of the prostate, breast, ovary, colon and lung: a study using high density multiple human tumor tissue microarrays. , 2005, International journal of oncology.

[112]  Marios C. Papadopoulos,et al.  Impairment of angiogenesis and cell migration by targeted aquaporin-1 gene disruption , 2005, Nature.

[113]  A. Warth,et al.  Redistribution of aquaporin-4 in human glioblastoma correlates with loss of agrin immunoreactivity from brain capillary basal laminae , 2004, Acta Neuropathologica.

[114]  C. Moon,et al.  Involvement of aquaporins in colorectal carcinogenesis , 2003, Oncogene.

[115]  V. Santoni,et al.  Evaluation of nonionic and zwitterionic detergents as membrane protein solubilizers in two‐dimensional electrophoresis , 2003, Proteomics.

[116]  M. Papadopoulos,et al.  Increased aquaporin 1 water channel expression inhuman brain tumours , 2002, British Journal of Cancer.

[117]  M. Papadopoulos,et al.  Aquaporin-4 expression is increased in oedematous human brain tumours , 2002, Journal of neurology, neurosurgery, and psychiatry.

[118]  C. Rubio,et al.  Differential expression of Aquaporin 8 in human colonic epithelial cells and colorectal tumors , 2001, BMC Physiology.

[119]  A. Verkman,et al.  Cloning of a novel water and urea-permeable aquaporin from mouse expressed strongly in colon, placenta, liver, and heart. , 1997, Biochemical and biophysical research communications.

[120]  Peter Agre,et al.  Appearance of Water Channels in Xenopus Oocytes Expressing Red Cell CHIP28 Protein , 1992, Science.

[121]  R. Nussinov,et al.  Target identi fi cation among known drugs by deep learning from heterogeneous networks † , 2020 .

[122]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[123]  J. Thornton,et al.  Identifying DNA-binding proteins using structural motifs and the electrostatic potential. , 2004, Nucleic acids research.

[124]  J. Frøkiaer,et al.  Aquaporins in the kidney: from molecules to medicine. , 2002, Physiological reviews.