Distinguishing three subtypes of hematopoietic cells based on gene expression profiles using a support vector machine.

Hematopoiesis is a complicated process involving a series of biological sub-processes that lead to the formation of various blood components. A widely accepted model of early hematopoiesis proceeds from long-term hematopoietic stem cells (LT-HSCs) to multipotent progenitors (MPPs) and then to lineage-committed progenitors. However, the molecular mechanisms of early hematopoiesis have not been fully characterized. In this study, we applied a computational strategy to identify the gene expression signatures distinguishing three types of closely related hematopoietic cells collected in recent studies: (1) hematopoietic stem cell/multipotent progenitor cells; (2) LT-HSCs; and (3) hematopoietic progenitor cells. Each cell in these cell types was represented by its gene expression profile among a total number of 20,475 genes. The expression features were analyzed by a Monte-Carlo Feature Selection (MCFS) method, resulting in a feature list. Then, the incremental feature selection (IFS) and a support vector machine (SVM) optimized with a sequential minimum optimization (SMO) algorithm were employed to access the optimal classifier with the highest Matthews correlation coefficient (MCC) value of 0.889, in which 6698 features were used to represent cells. In addition, through an updated program of MCFS method, seventeen decision rules can be obtained, which can classify the three cell types with an overall accuracy of 0.812. Using a literature review, both the rules and the top features used for building the optimal classifier were confirmed to be commonly used or potential biological markers for distinguishing the three cell types of HSPCs. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.

[1]  G. Bouyer,et al.  Erythrocyte peripheral type benzodiazepine receptor/voltage-dependent anion channels are upregulated by Plasmodium falciparum. , 2011, Blood.

[2]  Lei Chen,et al.  Analysis and Prediction of Myristoylation Sites Using the mRMR Method, the IFS Method and an Extreme Learning Machine Algorithm. , 2017, Combinatorial chemistry & high throughput screening.

[3]  R. Harvey,et al.  Architectural Defects in the Spleens of Nkx2-3-Deficient Mice Are Intrinsic and Associated with Defects in Both B Cell Maturation and T Cell-Dependent Immune Responses1 , 2003, The Journal of Immunology.

[4]  W. Hiddemann,et al.  High expression of MZB1 predicts adverse prognosis in chronic lymphocytic leukemia, follicular lymphoma and diffuse large B-cell lymphoma and is associated with a unique gene expression signature , 2013, Leukemia & lymphoma.

[5]  Lei Chen,et al.  A Binary Classifier for Prediction of the Types of Metabolic Pathway of Chemicals. , 2017, Combinatorial chemistry & high throughput screening.

[6]  Nathan C Boles,et al.  Mouse hematopoietic stem cell identification and analysis , 2009, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[7]  Chen Chu,et al.  Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models , 2015, Amino Acids.

[8]  F. Piscaglia,et al.  Expression of reelin in hepatic stellate cells and during hepatic tissue repair: a novel marker for the differentiation of HSC from other liver myofibroblasts. , 2002, Journal of hepatology.

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  W. Alexander,et al.  Runx1 downregulates stem cell and megakaryocytic transcription programs that support niche interactions. , 2016, Blood.

[11]  M. Malumbres,et al.  CDK6 as a key regulator of hematopoietic and leukemic stem cell activation. , 2015, Blood.

[12]  I. Weissman,et al.  Myeloerythroid-restricted progenitors are sufficient to confer radioprotection and provide the majority of day 8 CFU-S. , 2002, The Journal of clinical investigation.

[13]  Duccio Cavalieri,et al.  Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting , 2015, PLoS Comput. Biol..

[14]  Fabian J. Theis,et al.  Combined Single-Cell Functional and Gene Expression Analysis Resolves Heterogeneity within Stem Cell Populations , 2015, Cell stem cell.

[15]  Lei Chen,et al.  Identification of Drug-Drug Interactions Using Chemical Interactions , 2017 .

[16]  I. Weissman,et al.  Identification of Clonogenic Common Lymphoid Progenitors in Mouse Bone Marrow , 1997, Cell.

[17]  Jan Gorodkin,et al.  Comparing two K-category assignments by a K-category correlation coefficient , 2004, Comput. Biol. Chem..

[18]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[19]  David Bryder,et al.  Elucidation of the phenotypic, functional, and molecular topography of a myeloerythroid progenitor cell hierarchy. , 2007, Cell stem cell.

[20]  T. Ley,et al.  Expression and Function of PML-RARA in the Hematopoietic Progenitor Cells of Ctsg-PML-RARA Mice , 2012, PloS one.

[21]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[22]  Henrietta A. Calhoun RÉSUMÉ ON THE CIRCULATORY SYSTEM: REVIEW OF LITERATURE OF 1917, 1918 AND 1919 , 1920 .

[23]  Jia Qian Wu,et al.  Identification of key factors regulating self-renewal and differentiation in EML hematopoietic precursor cells by RNA-sequencing analysis. , 2014, Journal of visualized experiments : JoVE.

[24]  Lei Chen,et al.  Classifying Ten Types of Major Cancers Based on Reverse Phase Protein Array Profiles , 2015, PloS one.

[25]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[26]  I. Weissman,et al.  Flk-2 is a marker in hematopoietic stem cell differentiation: A simple method to isolate long-term stem cells , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[28]  Fillia Makedon,et al.  Application of Relief-F feature filtering algorithm to selecting informative genes for cancer classification using microarray data , 2004 .

[29]  Chongzhi Zang,et al.  Active enhancers are delineated de novo during hematopoiesis, with limited lineage fidelity among specified primary blood cells , 2014, Genes & development.

[30]  Jan Komorowski,et al.  A Rough Set-Based Model of HIV-1 Reverse Transcriptase Resistome , 2009, Bioinformatics and biology insights.

[31]  Yu-Dong Cai,et al.  Analysis and Prediction of Nitrated Tyrosine Sites with the mRMR Method and Support Vector Machine Algorithm , 2016 .

[32]  Stefan Thurner,et al.  A fast and efficient gene-network reconstruction method from multiple over-expression experiments , 2009, BMC Bioinformatics.

[33]  Sameem Abdul Kareem,et al.  Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods , 2012, BMC Bioinformatics.

[34]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[35]  N H Hwang,et al.  A review of the bases for the hydraulic transmission line equations as applied to circulatory systems. , 1972, Journal of biomechanics.

[36]  F. Azizi,et al.  Dietary Protein, Protein to Carbohydrate Ratio and Subsequent Changes in Lipid Profile after a 3-Year Follow-Up: Tehran Lipid and Glucose Study , 2013, Iranian journal of public health.

[37]  Agnieszka Nowak-Brzezińska,et al.  The Monte Carlo feature selection and interdependency discovery is unbiased , 2011 .

[38]  D. Oxley,et al.  The Immune System GTPase GIMAP6 Interacts with the Atg8 Homologue GABARAPL2 and Is Recruited to Autophagosomes , 2013, PloS one.

[39]  I. Weissman,et al.  The long-term repopulating subset of hematopoietic stem cells is deterministic and isolatable by phenotype. , 1994, Immunity.

[40]  Zhi-jie Ma,et al.  Hematopoietic Stem and Progenitor Cells Can Be Enriched by Implanting Biomaterial into Spatium Intermusculare , 2015, BioMed research international.

[41]  V. Sollars,et al.  YBX1 expression and function in early hematopoiesis and leukemic cells , 2011, Immunogenetics.

[42]  Tao Huang,et al.  Identification of the core regulators of the HLA I-peptide binding process , 2017, Scientific Reports.

[43]  R. Siebert,et al.  Homeobox NKX2-3 promotes marginal-zone lymphomagenesis by activating B-cell receptor signalling and shaping lymphocyte dynamics , 2016, Nature Communications.

[44]  A. Devasia,et al.  Extramedullary hematopoiesis in the adrenal: Case report and review of literature. , 2013, Canadian Urological Association journal = Journal de l'Association des urologues du Canada.

[45]  Lai Wei,et al.  Analysis and prediction of drug–drug interaction by minimum redundancy maximum relevance and incremental feature selection , 2017, Journal of biomolecular structure & dynamics.

[46]  Chen Chu,et al.  Predicting the types of metabolic pathway of compounds using molecular fragments and sequential minimal optimization. , 2016, Combinatorial chemistry & high throughput screening.

[47]  I. Bruns,et al.  PDGFRα and CD51 mark human Nestin+ sphere-forming mesenchymal stem cells capable of hematopoietic progenitor cell expansion , 2013, The Journal of experimental medicine.

[48]  G. S. Midla Extracorporeal circulatory systems and their role in military medicine: a clinical review. , 2007, Military medicine.

[49]  R. de Cabo,et al.  Dietary Protein to Carbohydrate Ratio and Caloric Restriction: Comparing Metabolic Outcomes in Mice. , 2015, Cell reports.

[50]  R. Grosschedl,et al.  Failure of B-cell differentiation in mice lacking the transcription factor EBF , 1995, Nature.

[51]  B. Hock,et al.  Human T lymphocytes and hematopoietic cell lines express CD24-associated carbohydrate epitopes in the absence of CD24 mRNA or protein. , 1996, Blood.

[52]  Joshua M. Stuart,et al.  Molecular Signatures of Quiescent, Mobilized and Leukemia-Initiating Hematopoietic Stem Cells , 2010, PloS one.

[53]  K. Chou,et al.  Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities , 2012, PloS one.

[54]  Hongbin Shen,et al.  Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features. , 2010, Journal of proteome research.

[55]  V. Papadopoulos,et al.  Translocator Protein 2 Is Involved in Cholesterol Redistribution during Erythropoiesis* , 2009, The Journal of Biological Chemistry.

[56]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[57]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[58]  Tao Huang,et al.  Identification of compound–protein interactions through the analysis of gene ontology, KEGG enrichment for proteins and molecular fragments of compounds , 2016, Molecular Genetics and Genomics.

[59]  I. Weissman,et al.  Differential Expression of Novel Potential Regulators in Hematopoietic Stem Cells , 2005, PLoS genetics.

[60]  Michał Dramiński,et al.  Discovering Networks of Interdependent Features in High-Dimensional Problems , 2016 .

[61]  G. Sauvageau,et al.  A Functional Screen to Identify Novel Effectors of Hematopoietic Stem Cell Activity , 2008, Cell.

[62]  Matthew R. Clutter,et al.  Novel Hematopoietic Progenitor Populations Revealed by Direct Assessment of GATA1 Protein Expression and cMPL Signaling Events , 2011, Stem cells.

[63]  M. Ogawa,et al.  Human bone marrow CD34- cells engraft in vivo and undergo multilineage expression that includes giving rise to CD34+ cells. , 1998, Experimental hematology.

[64]  Yuhang Wang,et al.  Application of Relief-F feature filtering algorithm to selecting informative genes for cancer classification using microarray data , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[65]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[66]  Lei Chen,et al.  Identification of gene expression signatures across different types of neural stem cells with the Monte‐Carlo feature selection method , 2018, Journal of cellular biochemistry.

[67]  D. Melton,et al.  "Stemness": Transcriptional Profiling of Embryonic and Adult Stem Cells , 2002, Science.

[68]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .

[69]  S. Teichmann,et al.  Single-Cell RNA-Sequencing Reveals a Continuous Spectrum of Differentiation in Hematopoietic Cells , 2016, Cell reports.

[70]  Lei Chen,et al.  A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class. , 2017, Combinatorial chemistry & high throughput screening.

[71]  Dustin E. Schones,et al.  Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation. , 2009, Cell stem cell.

[72]  Bozena Kaminska,et al.  Combinatorial identification of DNA methylation patterns over age in the human brain , 2016, BMC Bioinformatics.

[73]  Jan Komorowski,et al.  Monte Carlo Feature Selection and Interdependency Discovery in Supervised Classification , 2010, Advances in Machine Learning II.

[74]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[75]  You Zhou,et al.  Feature Classification and Analysis of Lung Cancer Related Genes through Gene Ontology and KEGG Pathways , 2016 .

[76]  J. Olynyk,et al.  Upregulation of lymphotoxin β expression in liver progenitor (oval) cells in chronic hepatitis C , 2003 .

[77]  S. Kajigaya,et al.  Apparent mtDNA sequence heterogeneity in single human blood CD34+ cells is markedly affected by storage and transport. , 2013, Mutation research.

[78]  B. Göttgens,et al.  The Transcriptional Coactivator Cbp Regulates Self-Renewal and Differentiation in Adult Hematopoietic Stem Cells , 2011, Molecular and Cellular Biology.

[79]  Michihiro Kobayashi,et al.  Regulation of murine hematopoietic stem cell quiescence by Dmtf1. , 2011, Blood.

[80]  Jan Komorowski,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm486 Data and text mining Monte Carlo , 2022 .

[81]  S. Morrison,et al.  Supplemental Experimental Procedures , 2022 .

[82]  G. Johnson,et al.  Murine hematopoietic stem and progenitor cells: I. Enrichment and biologic characterization. , 1995, Blood.

[83]  T. Suda,et al.  Endothelial protein C receptor-expressing hematopoietic stem cells reside in the perisinusoidal niche in fetal liver. , 2010, Blood.

[84]  Xi C. He,et al.  Transcriptional accessibility for genes of multiple tissues and hematopoietic lineages is hierarchically controlled during early hematopoiesis. , 2003, Blood.

[85]  S. Janmohamed,et al.  Intermediate-term hematopoietic stem cells with extended but time-limited reconstitution potential. , 2010, Cell stem cell.

[86]  Nicola K. Wilson,et al.  A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. , 2016, Blood.

[87]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[88]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[89]  P. Hogarth,et al.  ZSWIM1: a novel biomarker in T helper cell differentiation. , 2014, Immunology letters.

[90]  Tao Huang,et al.  Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways , 2017, Artif. Intell. Medicine.

[91]  V. Subramanian,et al.  Stem cells and the regulation of proliferation, differentiation and patterning in the intestinal epithelium: emerging insights from gene expression patterns, transgenic and gene ablation studies , 2001, Mechanisms of Development.

[92]  H. Nakauchi,et al.  Heterogeneity and hierarchy within the most primitive hematopoietic stem cell compartment , 2010, The Journal of experimental medicine.

[93]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[94]  Xun Zhu,et al.  Single cell transcriptomics reveals unanticipated features of early hematopoietic precursors , 2016, Nucleic acids research.

[95]  H. Arnold,et al.  Targeted disruption of the homeobox transcription factor Nkx2-3 in mice results in postnatal lethality and abnormal development of small intestine and spleen. , 1999, Development.

[96]  T. Renné,et al.  Analysis of Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial Transcriptome. , 2016, Cell systems.

[97]  C. Shaw,et al.  Molecular Signatures of Proliferation and Quiescence in Hematopoietic Stem Cells , 2004, PLoS biology.

[98]  E. Bresnick,et al.  Establishing a hematopoietic genetic network through locus-specific integration of chromatin regulators , 2013, Proceedings of the National Academy of Sciences.

[99]  I. Weissman,et al.  A clonogenic common myeloid progenitor that gives rise to all myeloid lineages , 2000, Nature.

[100]  Lei Chen,et al.  Gene expression profiling gut microbiota in different races of humans , 2016, Scientific Reports.

[101]  Steven B. Bradfute,et al.  Hematopoietic fingerprints: an expression database of stem cells and their progeny. , 2007, Cell stem cell.

[102]  Jing Lu,et al.  Analysis and Identification of Aptamer-Compound Interactions with a Maximum Relevance Minimum Redundancy and Nearest Neighbor Algorithm , 2016, BioMed research international.

[103]  Teruhiko Yoshida,et al.  Intratumoral interferon-α gene transfer enhances tumor immunity after allogeneic hematopoietic stem cell transplantation , 2009, Cancer Immunology, Immunotherapy.

[104]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[105]  C. Esmon,et al.  Endothelial cell protein C receptor: a multiliganded and multifunctional receptor. , 2014, Blood.

[106]  H. Arnold,et al.  Transcription Factor Nkx2-3 Controls the Vascular Identity and Lymphocyte Homing in the Spleen , 2011, The Journal of Immunology.

[107]  P. Tam,et al.  IFITM/Mil/fragilis family proteins IFITM1 and IFITM3 play distinct roles in mouse primordial germ cell homing and repulsion. , 2005, Developmental cell.

[108]  Yu-Dong Cai,et al.  Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways , 2017, PloS one.

[109]  Keisuke Ito,et al.  Metabolic requirements for the maintenance of self-renewing stem cells , 2014, Nature Reviews Molecular Cell Biology.

[110]  Z. Han,et al.  Roles of platelet factor 4 in hematopoiesis and angiogenesis , 2006, Growth factors.

[111]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[112]  Herman Waldmann,et al.  Tmem176B and Tmem176A are associated with the immature state of dendritic cells , 2010, Journal of leukocyte biology.

[113]  Chen Chu,et al.  Analysis of Gene Expression Profiles in the Human Brain Stem, Cerebellum and Cerebral Cortex , 2016, PloS one.

[114]  T. Tan,et al.  Detection of aneuploidy from single fetal nucleated red blood cells using whole genome sequencing , 2015, Prenatal diagnosis.

[115]  Michał Dramiński,et al.  Computational Analysis of Molecular Interaction Networks Underlying Change of HIV-1 Resistance to Selected Reverse Transcriptase Inhibitors , 2010, Bioinformatics and biology insights.

[116]  C. Brenner,et al.  Characterization of a novel receptor that maps near the natural killer gene complex: demonstration of carbohydrate binding and expression in hematopoietic cells. , 1999, Cancer research.

[117]  R. Gamelli,et al.  Murine hematopoietic stem cells and progenitors express adrenergic receptors , 2007, Journal of Neuroimmunology.

[118]  A. Bøyum Separation of lymphocytes, lymphocyte subgroups and monocytes: a review. , 1977, Lymphology.

[119]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.