Mislocalization-related disease gene discovery using gene expression based computational protein localization prediction.

Protein sorting is an important mechanism for transporting proteins to their target subcellular locations after their synthesis. Mutations on genes may disrupt the well regulated protein sorting process, leading to a variety of mislocation related diseases. This paper proposes a methodology to discover such disease genes based on gene expression data and computational protein localization prediction. A kernel logistic regression based algorithm is used to successfully identify several candidate cancer genes which may cause cancers due to their mislocation within the cell. Our results also showed that compared to the gene co-expression network defined on Pearson correlation coefficients, the nonlinear Maximum Correlation Coefficients (MIC) based co-expression network give better results for subcellular localization prediction.

[1]  David J. Adams,et al.  Cancer gene discovery in mouse and man , 2009, Biochimica et biophysica acta.

[2]  P. Lizardi,et al.  Genome-wide approaches for cancer gene discovery. , 2011, Trends in biotechnology.

[3]  Thomas D. Wu,et al.  Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. , 2006, Cancer cell.

[4]  Sun-Yuan Kung,et al.  mPLR-Loc: an adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction. , 2015, Analytical biochemistry.

[5]  S. Geisler,et al.  Identification and characterization of retinoblastoma gene mutations disturbing apoptosis in human breast cancers , 2010, Molecular Cancer.

[6]  Gianluca Pollastri,et al.  SCLpred: protein subcellular localization prediction by N-to-1 neural networks , 2011, Bioinform..

[7]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[8]  Oliver Kohlbacher,et al.  Going from where to why—interpretable prediction of protein subcellular localization , 2010, Bioinform..

[9]  Jinbo Xu,et al.  Disease Gene Prioritization Using Network and Feature , 2015, J. Comput. Biol..

[10]  Trey Ideker,et al.  Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species , 2008, Nucleic acids research.

[11]  S. R. Wente,et al.  Peering through the pore: nuclear pore complex structure, assembly, and function. , 2003, Developmental cell.

[12]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[13]  T. Rapoport Protein translocation across the eukaryotic endoplasmic reticulum and bacterial plasma membranes , 2007, Nature.

[14]  Derek Y. Chiang,et al.  Cancer gene discovery in hepatocellular carcinoma. , 2010, Journal of hepatology.

[15]  Jianjun Hu,et al.  Network based prediction of protein localisation using diffusion Kernel , 2014, Int. J. Data Min. Bioinform..

[16]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[17]  Song Zhang,et al.  DBMLoc: a Database of proteins with multiple subcellular localizations , 2008, BMC Bioinformatics.

[18]  Eric M Reiman,et al.  Gene expression profiles in anatomically and functionally distinct regions of the normal aged human brain. , 2007, Physiological genomics.

[19]  J. Godovac-Zimmermann,et al.  Proteomics reveals the importance of the dynamic redistribution of the subcellular location of proteins in breast cancer cells , 2015, Expert review of proteomics.

[20]  G. Pierce,et al.  Therapeutic Targeting of Nuclear Protein Import in Pathological Cell Conditions , 2009, Pharmacological Reviews.

[21]  Chittibabu Guda,et al.  LocSigDB: a database of protein localization signals , 2015, Database J. Biol. Databases Curation.

[22]  Trey Ideker,et al.  Proteome-wide discovery of mislocated proteins in cancer , 2013, Genome research.

[23]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[24]  S. Gabriel,et al.  Discovery and saturation analysis of cancer genes across 21 tumor types , 2014, Nature.

[25]  W. Schliebs,et al.  Peroxisomes as dynamic organelles: peroxisomal matrix protein import , 2010, The FEBS journal.

[26]  Winnie S. Liang,et al.  Alzheimer's disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons , 2008, Proceedings of the National Academy of Sciences.

[27]  Jane Fridlyand,et al.  Reversing HOXA9 oncogene activation by PI3K inhibition: epigenetic mechanism and prognostic significance in human glioblastoma. , 2010, Cancer research.

[28]  M. Vihinen,et al.  PROlocalizer: integrated web service for protein subcellular localization prediction , 2010, Amino Acids.

[29]  M. Vihinen,et al.  Prediction of disease-related mutations affecting protein localization , 2009, BMC Genomics.

[30]  Robert E. Kearney,et al.  Quantitative Proteomics Analysis of the Secretory Pathway , 2006, Cell.

[31]  R. Casadio,et al.  The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation. , 2008, Briefings in functional genomics & proteomics.

[32]  S. Brunak,et al.  Locating proteins in the cell using TargetP, SignalP and related tools , 2007, Nature Protocols.

[33]  Rui Jiang,et al.  Pinpointing disease genes through phenomic and genomic data fusion , 2015, BMC Genomics.

[34]  Jinyan Li,et al.  Laplacian normalization and random walk on heterogeneous networks for disease-gene prioritization , 2015, Comput. Biol. Chem..

[35]  Zhen-Hui Zhang,et al.  A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine , 2006, FEBS letters.

[36]  Ting Chen,et al.  Diffusion kernel-based logistic regression models for protein function prediction. , 2006, Omics : a journal of integrative biology.

[37]  A. Chinnaiyan,et al.  Of mice and men: cancer gene discovery using comparative oncogenomics. , 2006, Cancer cell.

[38]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[39]  Li Zhang,et al.  A novel representation for apoptosis protein subcellular localization prediction using support vector machine. , 2009, Journal of theoretical biology.

[40]  Shailendra Singh,et al.  Computational Disease Gene Prioritization: An Appraisal , 2014, J. Comput. Biol..

[41]  Bart De Moor,et al.  Candidate gene prioritization by network analysis of differential expression using machine learning approaches , 2010, BMC Bioinformatics.

[42]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[43]  Peer Bork,et al.  Predicting protein cellular localization using a domain projection method. , 2002, Genome research.

[44]  Alexandre P. Francisco,et al.  Interactogeneous: Disease Gene Prioritization Using Heterogeneous Networks and Full Topology Scores , 2012, PloS one.

[45]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[46]  M. Gerstein,et al.  Subcellular localization of the yeast proteome. , 2002, Genes & development.

[47]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[48]  D. Adams,et al.  Cancer gene discovery in the mouse. , 2012, Current opinion in genetics & development.

[49]  J. Selbig,et al.  SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data , 2011, Front. Plant Sci..

[50]  L. J. Terry,et al.  Crossing the Nuclear Envelope: Hierarchical Regulation of Nucleocytoplasmic Transport , 2007, Science.

[51]  Doheon Lee,et al.  PLPD: reliable protein localization prediction from imbalanced and overlapped datasets , 2006, Nucleic acids research.

[52]  D. Hebert,et al.  Protein Translocons Multifunctional Mediators of Protein Translocation across Membranes , 2003, Cell.

[53]  E. Guney,et al.  Exploiting Protein-Protein Interaction Networks for Genome-Wide Disease-Gene Prioritization , 2012, PloS one.

[54]  Jianjun Hu,et al.  NetLoc: Network based protein localization prediction using protein-protein interaction and co-expression networks , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[55]  Richard G Grundy,et al.  Integrated molecular genetic profiling of pediatric high-grade gliomas reveals key differences with the adult disease. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[56]  Wolfgang Link,et al.  Protein localization in disease and therapy , 2011, Journal of Cell Science.

[57]  Randy Schekman,et al.  Protein Translocation Across Biological Membranes , 2005, Science.

[58]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[59]  Christian Stolte,et al.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence , 2014, Database J. Biol. Databases Curation.

[60]  A. Holland,et al.  Gene expression profiling in the adult Down syndrome brain. , 2007, Genomics.

[61]  Emre Guney,et al.  Analysis of the Robustness of Network-Based Disease-Gene Prioritization Methods Reveals Redundancy in the Human Interactome and Functional Diversity of Disease-Genes , 2014, PloS one.

[62]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[63]  Ying Yu,et al.  Identification of the Causative Gene for Simmental Arachnomelia Syndrome Using a Network-Based Disease Gene Prioritization Approach , 2013, PloS one.

[64]  N. Pfanner,et al.  Mitochondrial protein import: from proteomics to functional mechanisms , 2010, Nature Reviews Molecular Cell Biology.

[65]  P. Sebastiani,et al.  Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile , 2010, British Journal of Cancer.

[66]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[67]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[68]  Johann de Jong,et al.  Computational identification of insertional mutagenesis targets for cancer gene discovery , 2011, Nucleic acids research.