Towards a Shared, Conceptual Model-Based Understanding of Proteins and Their Interactions

Understanding the human genome is a big research challenge. The huge complexity and amount of genome data require extremely effective and efficient data management policies. A first crucial point is to obtain a shared understanding of the domain, which becomes a very hard task considering the number of different genome data sources. To make things more complicated, those data sources deal with different parts of genome-based information: we not only need to understand them well, but also to integrate and intercommunicate all the relevant information. The protein perspective is a good example: rich, well-known repositories such as UniProt provide a lot of valuable information that it is not easy to interpret and manage when we want to generate useful results. Proteomes and basic information, protein-protein interaction, protein structure, protein processing events, protein function, etc. provide a lot of information is that needs to be conceptually characterized and delimited. To facilitate the essential common understanding of the domain, this paper uses the case of proteins to analyze the data provided by Uniprot in order to make a sound conceptualization work for identifying the relevant domain concepts. A conceptual model of proteins is the result of this conceptualization process, explained in detail in this work. This holistic conceptual model of proteins presented in this paper is the result of achieving a precise ontological commitment. It establishes concepts and their relationships that are significant in order to have a solid basis to efficiently manage relevant genome data related to proteins.

[1]  Oscar Pastor López,et al.  Conceptual Model of Proteins , 2020 .

[2]  R. Singh,et al.  The role of the active site tyrosine in the mechanism of lytic polysaccharide monooxygenase , 2020, Chemical science.

[3]  Anne Morgat,et al.  Updates in Rhea: SPARQLing biochemical reaction data , 2018, Nucleic Acids Res..

[4]  Abhishek Dubey,et al.  Posttranslational Modification , 2019, Encyclopedia of Animal Cognition and Behavior.

[5]  Yang Sun,et al.  Protein Lipidation in Cell Signaling and Diseases: Function, Regulation, and Therapeutic Opportunities. , 2018, Cell chemical biology.

[6]  K. Ahmad,et al.  Protein-protein Interactions and their Role in Various Diseases and their Prediction Techniques. , 2017, Current protein & peptide science.

[7]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[8]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[9]  Oscar Pastor,et al.  Modeling Life: A Conceptual Schema-centric Approach to Understand the Genome , 2017, Conceptual Modeling Perspectives.

[10]  Oscar Pastor,et al.  Applying Conceptual Modeling to Better Understand the Human Genome , 2016, ER.

[11]  M. Kanai,et al.  Site-Selective Peptide/Protein Cleavage. , 2015, Topics in current chemistry.

[12]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..

[13]  Rui Zhao,et al.  An Overview of the Prediction of Protein DNA-Binding Sites , 2015, International journal of molecular sciences.

[14]  J. Steitz,et al.  RNA editing, epitranscriptomics, and processing in cancer progression , 2015, Cancer biology & therapy.

[15]  W. R. Novak Tertiary Structure Domains, Folds and Motifs , 2014 .

[16]  L. Ruddock,et al.  Disulfide bond formation in the cytoplasm. , 2013, Antioxidants & redox signaling.

[17]  J. V. Van Eyk,et al.  Analysis of protein isoforms: Can we do it better? , 2012, Proteomics.

[18]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[19]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[20]  Baris E. Suzek,et al.  The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[21]  Oscar Pastor,et al.  Conceptual Modeling Meets the Human Genome , 2008, ER.

[22]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[23]  Michel Schneider,et al.  UniProtKB/Swiss-Prot. , 2007, Methods in molecular biology.

[24]  M. West,et al.  Embracing the complexity of genomic data for personalized medicine. , 2006, Genome research.

[25]  S. Paul,et al.  Protein‐misfolding diseases and chaperone‐based therapeutic approaches , 2006, The FEBS journal.

[26]  Rolf Apweiler,et al.  UniProt archive , 2004, Bioinform..

[27]  Pierre Dönnes,et al.  Predicting Protein Subcellular Localization: Past, Present, and Future , 2004, Genomics, proteomics & bioinformatics.

[28]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[29]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[30]  J. Neefjes,et al.  Protein glycosylation. , 1990, Current opinion in cell biology.