Integrated Learning: Screening Optimal Biomarkers for Identifying Preeclampsia in Placental mRNA Samples

Preeclampsia (PE) is a maternal disease that causes maternal and child death. Treatment and preventive measures are not sound enough. The problem of PE screening has attracted much attention. The purpose of this study is to screen placental mRNA to obtain the best PE biomarkers for identifying patients with PE. We use Limma in the R language to screen out the 48 differentially expressed genes with the largest differences and used correlation-based feature selection algorithms to reduce the dimensionality and avoid attribute redundancy arising from too many mRNA samples participating in the classification. After reducing the mRNA attributes, the mRNA samples are sorted from large to small according to information gain. In this study, a classifier model is designed to identify whether samples had PE through mRNA in the placenta. To improve the accuracy of classification and avoid overfitting, three classifiers, including C4.5, AdaBoost, and multilayer perceptron, are used. We use the majority voting strategy integrated with the differentially expressed genes and the genes filtered by the best subset method as comparison methods to train the classifier. The results show that the classification accuracy rate has increased from 79% to 82.2%, and the number of mRNA features has decreased from 48 to 13. This study provides clues for the main PE biomarkers of mRNA in the placenta and provides ideas for the treatment and screening of PE.

[1]  Zhi-Hua Zhou,et al.  Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble , 2003, IEEE Transactions on Information Technology in Biomedicine.

[2]  Joseph E. Cavanaugh,et al.  Ordered quantile normalization: a semiparametric transformation built for the cross-validation era , 2019, Journal of applied statistics.

[3]  Muhammad Tariq,et al.  Classifier ensemble optimization for gender classification using Genetic Algorithm , 2010, 2010 International Conference on Information and Emerging Technologies.

[4]  Holger Stepan,et al.  Predictive Value of the sFlt-1: PlGF Ratio in Women With Suspected Preeclampsia , 2016 .

[5]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[6]  L. D’Adamio,et al.  Significance of Blood and Cerebrospinal Fluid Biomarkers for Alzheimer’s Disease: Sensitivity, Specificity and Potential for Clinical Use , 2020, Journal of personalized medicine.

[7]  Behzad Soleimani Neysiani,et al.  Feature Selection in Pre-Diagnosis Heart Coronary Artery Disease Detection: A heuristic approach for feature selection based on Information Gain Ratio and Gini Index , 2020, 2020 6th International Conference on Web Research (ICWR).

[8]  Qinghua Guo,et al.  LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse , 2018, Nucleic Acids Res..

[9]  Jianhua Dai,et al.  Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification , 2013, Appl. Soft Comput..

[10]  Jiaming Su,et al.  Systematic analysis of alternative splicing signature unveils prognostic predictor for kidney renal clear cell carcinoma , 2019, Journal of cellular physiology.

[11]  Amit Sagar,et al.  Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. , 2019, Protein and peptide letters.

[12]  S. Karumanchi,et al.  Preeclampsia: Pathophysiology, Challenges, and Perspectives , 2019 .

[13]  D. Dong,et al.  Multi-parametric MRI-based radiomics signature for discriminating between clinically significant and insignificant prostate cancer: Cross-validation of a machine learning method. , 2019, European journal of radiology.

[14]  N. Chandrasekaran,et al.  Comparison of Decision Tree-Based Learning Algorithms Using Breast Cancer Data , 2020 .

[15]  J. Puschett,et al.  Preeclampsia. Part 2: Experimental and Genetic Considerations , 2002, Obstetrical & gynecological survey.

[16]  Namkug Kim,et al.  Radiomic features and multilayer perceptron network classifier: a robust MRI classification strategy for distinguishing glioblastoma from primary central nervous system lymphoma , 2019, Scientific Reports.

[17]  A. Filipek,et al.  [Preeclampsia - a disease of pregnant women]. , 2018, Postepy biochemii.

[18]  Dacheng Tao,et al.  On Combining Biclustering Mining and AdaBoost for Breast Tumor Classification , 2020, IEEE Transactions on Knowledge and Data Engineering.

[19]  Ruiman Li,et al.  Potential Protein Biomarkers for Preeclampsia , 2020, Cureus.

[20]  Epigenetic regulation of placental gene expression in transcriptional subtypes of preeclampsia , 2018, Clinical Epigenetics.

[21]  B. Cox,et al.  Gene markers of normal villous maturation and their expression in placentas with maturational pathology. , 2017, Placenta.

[22]  James Lyons-Weiler,et al.  Altered global gene expression in first trimester placentas of women destined to develop preeclampsia. , 2009, Placenta.

[23]  Xue Ying,et al.  An Overview of Overfitting and its Solutions , 2019, Journal of Physics: Conference Series.

[24]  Feng Huang,et al.  A Fast Linear Neighborhood Similarity-Based Network Link Inference Method to Predict MicroRNA-Disease Associations , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Wen Zhang,et al.  A multimodal deep learning framework for predicting drug-drug interaction events , 2020, Bioinform..

[26]  B. Mol,et al.  Preeclampsia; short and long-term consequences for mother and neonate. , 2016, Early human development.

[27]  Bilal Mirza,et al.  Machine Learning and Integrative Analysis of Biomedical Big Data , 2019, Genes.

[28]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[29]  F. Wang,et al.  Methods of MicroRNA Promoter Prediction and Transcription Factor Mediated Regulatory Network , 2017, BioMed research international.

[30]  Yanlin Chen,et al.  SFLLN: A sparse feature learning ensemble method with linear neighborhood regularization for predicting drug-drug interactions , 2019, Inf. Sci..

[31]  DUBStepR: correlation-based feature selection for clustering single-cell RNA sequencing data , 2020 .

[32]  Ester Pantaleo,et al.  Identifying potential gene biomarkers for Parkinson’s disease through an information entropy based approach , 2020, Physical biology.

[33]  Brian J. Cox,et al.  Unsupervised Placental Gene Expression Profiling Identifies Clinically Relevant Subclasses of Human Preeclampsia , 2016, Hypertension.