Automated classification of protein subcellular localization in immunohistochemistry images to reveal biomarkers in colon cancer

Background Protein biomarkers play important roles in cancer diagnosis. Many efforts have been made on measuring abnormal expression intensity in biological samples to identity cancer types and stages. However, the change of subcellular location of proteins, which is also critical for understanding and detecting diseases, has been rarely studied. Results In this work, we developed a machine learning model to classify protein subcellular locations based on immunohistochemistry images of human colon tissues, and validated the ability of the model to detect subcellular location changes of biomarker proteins related to colon cancer. The model uses representative image patches as inputs, and integrates feature engineering and deep learning methods. It achieves 92.69% accuracy in classification of new proteins. Two validation datasets of colon cancer biomarkers derived from published literatures and the human protein atlas database respectively are employed. It turns out that 81.82 and 65.66% of the biomarker proteins can be identified to change locations. Conclusions Our results demonstrate that using image patches and combining predefined and deep features can improve the performance of protein subcellular localization, and our model can effectively detect biomarkers based on protein subcellular translocations. This study is anticipated to be useful in annotating unknown subcellular localization for proteins and discovering new potential location biomarkers.

[1]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[2]  Robert F. Murphy,et al.  Automated analysis of immunohistochemistry images identifies candidate location biomarkers for cancers , 2014, Proceedings of the National Academy of Sciences.

[3]  Shunfang Wang,et al.  An Improved Process for Generating Uniform PSSMs and Its Application in Protein Subcellular Localization via Various Global Dimension Reduction Techniques , 2019, IEEE Access.

[4]  Chi-Ying F. Huang,et al.  Aberrant nuclear localization of EBP50 promotes colorectal carcinogenesis in xenotransplanted mice by modulating TCF-1 and β-catenin interactions. , 2012, The Journal of clinical investigation.

[5]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Casper F Winsnes,et al.  Deep learning is combined with massive-scale citizen science to improve large-scale image classification , 2018, Nature Biotechnology.

[7]  Daoqiang Zhang,et al.  Human cell structure-driven model construction for predicting protein subcellular location from biological images , 2015, Bioinform..

[8]  Jijun Tang,et al.  Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou's general PseAAC. , 2019, Journal of theoretical biology.

[9]  Guanghui Wang,et al.  Loss of nuclear localization of TET2 in colorectal cancer , 2016, Clinical Epigenetics.

[10]  M. Schuldiner,et al.  The emergence of proteome-wide technologies: systematic analysis of proteins comes of age , 2014, Nature Reviews Molecular Cell Biology.

[11]  Kai Huang,et al.  Feature reduction for improved recognition of subcellular location patterns in fluorescence microscope images , 2003, SPIE BiOS.

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Hong-Bin Shen,et al.  Bioimage-based protein subcellular location prediction: a comprehensive review , 2018, Frontiers of Computer Science.

[14]  Isabelle Bichindaritz,et al.  Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Hong-Bin Shen,et al.  ImPLoc: a multi-instance deep learning model for the prediction of protein subcellular localization based on immunohistochemistry images , 2019, Bioinform..

[16]  D. Rimm,et al.  Tissue Microarray Analysis of β-Catenin in Colorectal Cancer Shows Nuclear Phospho-β-catenin Is Associated with a Better Prognosis , 2001 .

[17]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[18]  Dietrich Rebholz-Schuhmann,et al.  Predicting links between tumor samples and genes using 2-Layered graph based diffusion approach , 2019, BMC Bioinformatics.

[19]  Yu Liu,et al.  PTM-ssMP: A Web Server for Predicting Different Types of Post-translational Modification Sites Using Novel Site-specific Modification Profile , 2018, International journal of biological sciences.

[20]  R. Murphy,et al.  A framework for the automated analysis of subcellular patterns in human protein atlas images. , 2008, Journal of proteome research.

[21]  Hong-Bin Shen,et al.  Hum‐mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features , 2016, Bioinform..

[22]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yolanda T. Chong,et al.  Automated analysis of high‐content microscopy data with deep learning , 2017, Molecular systems biology.

[24]  Robert F. Murphy,et al.  Automated comparison of protein subcellular location patterns between images of normal and cancerous tissues , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[25]  Hao Xu,et al.  Analysis of the Human Protein Atlas Image Classification competition , 2019, Nature Methods.

[26]  Yang Liu,et al.  MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy , 2019, BMC Bioinformatics.

[27]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[28]  Ehsan Kazemi,et al.  Deep Convolutional Neural Networks Enable Discrimination of Heterogeneous Digital Pathology Images , 2017, bioRxiv.

[29]  Leopold Parts,et al.  Accurate Classification of Protein Subcellular Localization from High-Throughput Microscopy Images Using Deep Learning , 2016, G3: Genes, Genomes, Genetics.

[30]  Yang Zhang,et al.  An image-based multi-label human protein subcellular localization predictor (iLocator) reveals protein mislocalizations in cancer tissues , 2013, Bioinform..

[31]  Wolfgang Link,et al.  Protein localization in disease and therapy , 2011, Journal of Cell Science.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  G. Zhou,et al.  Protein Expression Profiling of Breast Cancer Cells by Dissociable Antibody Microarray (DAMA) Staining*S , 2008, Molecular & Cellular Proteomics.

[34]  Ying Ju,et al.  Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier , 2016, Scientific Reports.

[35]  Zhen Liu,et al.  DBS: a fast and informative segmentation algorithm for DNA copy number analysis , 2019, BMC Bioinformatics.

[36]  Sepp Hochreiter,et al.  Human-level Protein Localization with Convolutional Neural Networks , 2018, ICLR.

[37]  D L Rimm,et al.  Tissue microarray analysis of beta-catenin in colorectal cancer shows nuclear phospho-beta-catenin is associated with a better prognosis. , 2001, Clinical cancer research : an official journal of the American Association for Cancer Research.

[38]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[39]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[40]  R. Mangues,et al.  Celecoxib induces anoikis in human colon carcinoma cells associated with the deregulation of focal adhesions and nuclear translocation of p130Cas , 2006, International journal of cancer.

[41]  Xing-Ming Zhao,et al.  DeepPhos: prediction of protein phosphorylation sites with deep learning , 2019, Bioinform..

[42]  Ao Li,et al.  Prediction of post-translational modification sites using multiple kernel support vector machine , 2017, PeerJ.