MIMIC: an optimization method to identify cell type-specific marker panel for cell sorting

Abstract Multi-omics data allow us to select a small set of informative markers for the discrimination of specific cell types and study of cellular heterogeneity. However, it is often challenging to choose an optimal marker panel from the high-dimensional molecular profiles for a large amount of cell types. Here, we propose a method called Mixed Integer programming Model to Identify Cell type-specific marker panel (MIMIC). MIMIC maintains the hierarchical topology among different cell types and simultaneously maximizes the specificity of a fixed number of selected markers. MIMIC was benchmarked on the mouse ENCODE RNA-seq dataset, with 29 diverse tissues, for 43 surface markers (SMs) and 1345 transcription factors (TFs). MIMIC could select biologically meaningful markers and is robust for different accuracy criteria. It shows advantages over the standard single gene-based approaches and widely used dimensional reduction methods, such as multidimensional scaling and t-SNE, both in accuracy and in biological interpretation. Furthermore, the combination of SMs and TFs achieves better specificity than SMs or TFs alone. Applying MIMIC to a large collection of 641 RNA-seq samples covering 231 cell types identifies a panel of TFs and SMs that reveal the modularity of cell type association networks. Finally, the scalability of MIMIC is demonstrated by selecting enhancer markers from mouse ENCODE data. MIMIC is freely available at https://github.com/MengZou1/MIMIC.

[1]  Sally Temple,et al.  A Systematic Approach to Identify Candidate Transcription Factors that Control Cell Identity , 2015, Stem cell reports.

[2]  İ. Reisli,et al.  CD3G Gene Defects in Familial Autoimmune Thyroiditis , 2014, Scandinavian journal of immunology.

[3]  Dan R. Littman,et al.  The Role of CXCR4 in Maintaining Peripheral B Cell Compartments and Humoral Immunity , 2004, The Journal of experimental medicine.

[4]  Philip Cayting,et al.  An encyclopedia of mouse DNA elements (Mouse ENCODE) , 2012, Genome Biology.

[5]  Yong Wang,et al.  Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations , 2018, Proceedings of the National Academy of Sciences.

[6]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[7]  Daphne Koller,et al.  Sharing and Specificity of Co-expression Networks across 35 Human Tissues , 2014, PLoS Comput. Biol..

[8]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[9]  P. Gregersen,et al.  Alternative splicing of CD79a (Ig-alpha/mb-1) and CD79b (Ig-beta/B29) RNA transcripts in human B cells. , 1995, Molecular immunology.

[10]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[11]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[12]  Akinao Nose,et al.  Expressed recombinant cadherins mediate cell sorting in model systems , 1988, Cell.

[13]  Hui Liu,et al.  AnimalTFDB: a comprehensive animal transcription factor database , 2011, Nucleic Acids Res..

[14]  Diego Miranda-Saavedra,et al.  Distinct transcriptional regulatory modules underlie STAT3’s cell type-independent and cell type-specific functions , 2013, Nucleic acids research.

[15]  Koji Kadota,et al.  ROKU: a novel method for identification of tissue-specific genes , 2006, BMC Bioinformatics.

[16]  W. Wong,et al.  Modeling gene regulation from paired expression and chromatin accessibility data , 2017, Proceedings of the National Academy of Sciences.

[17]  Cory Y. McLean,et al.  GREAT improves functional interpretation of cis-regulatory regions , 2010, Nature Biotechnology.

[18]  Yong Wang,et al.  A systematic method to identify modulation of transcriptional regulation via chromatin activity reveals regulatory network during mESC differentiation , 2016, Scientific Reports.

[19]  J. Abrahams,et al.  The Impact of Single Amino Acid Substitutions in CD3γ on the CD3ϵγ Interaction and T-Cell Receptor–CD3 Complex Formation , 2006 .

[20]  Christopher D. Brown,et al.  A Quantitative Proteome Map of the Human Body , 2019, Cell.

[21]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  O. Margalit,et al.  Epigenetic silencing of HOPX promotes cancer progression in colorectal cancer. , 2012, Neoplasia.

[23]  Christopher D. Brown,et al.  A Quantitative Proteome Map of the Human Body , 2019, Cell.

[24]  Mario Roederer,et al.  Dear Reader, , 2003, Nature Medicine.

[25]  Peng Qiu,et al.  Fast calculation of pairwise mutual information for gene regulatory network reconstruction , 2009, Comput. Methods Programs Biomed..

[26]  H. Niida,et al.  The MAR-binding protein SATB1 orchestrates temporal and spatial expression of multiple genes during T-cell development. , 2000, Genes & development.

[27]  Jiang Qian,et al.  TiGER: A database for tissue-specific gene expression and regulation , 2008, BMC Bioinformatics.

[28]  A. Órfão,et al.  General concepts about cell sorting techniques. , 1996, Clinical biochemistry.

[29]  A. Hutchins,et al.  Models of global gene expression define major domains of cell type and tissue identity , 2017, Nucleic acids research.

[30]  Diego Miranda-Saavedra,et al.  Genomic analysis of LPS-stimulated myeloid cells identifies a common pro-inflammatory response but divergent IL-10 anti-inflammatory responses , 2015, Scientific Reports.

[31]  D. Zack,et al.  Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factors in human tissues , 2006, Nucleic acids research.

[32]  Ruedi Aebersold,et al.  Mass-spectrometric identification and relative quantification of N-linked cell surface glycoproteins , 2009, Nature Biotechnology.

[33]  T. Kohwi-Shigematsu,et al.  SATB1 targets chromatin remodelling to regulate genes over long distances , 2002, Nature.

[34]  W. Wong,et al.  A New FACS Approach Isolates hESC Derived Endoderm Using Transcription Factors , 2011, PloS one.

[35]  P. Blackshear,et al.  Feedback Inhibition of Macrophage Tumor Necrosis Factor-α Production by Tristetraprolin , 1998 .

[36]  M. Bucan,et al.  Promoter features related to tissue specificity as measured by Shannon entropy , 2005, Genome Biology.

[37]  Ash A. Alizadeh,et al.  Robust enumeration of cell subsets from tissue expression profiles , 2015, Nature Methods.