Uncertainty, imprecision, and many-valued logics in protein bioinformatics.

Understanding proteins, their structures, functions, mutual interactions, activity in cellular reactions, interactions with drugs, and expression in body cells is a key to efficient medical diagnosis, drug production, and treatment of patients. Machine learning and data exploration methods supported by many-valued logics allow to grasp the imprecision and uncertainties that naturally occur in proteins and other biomolecules. Many-valued logics, like Łukasiewicz logic or fuzzy logic, are non-classical logics that do not restrict the number of truth values to only two values of true or false, but they allow for a larger set of truth degrees. In this paper, we briefly review the use of many-valued logics, especially the fuzzy logic, in bioinformatics. Then, we focus on protein bioinformatics, and present selected applications of many-valued logics in the analysis of complex protein structures, including; (1) potential-based protein similarity searching, (2) matching proteins on the basis of secondary structures, (3) 3D protein structure alignment, (4) prediction of intrinsically disordered proteins, and (5) fuzzy querying in large collections of Big macromolecular Data. Results of presented studies show that the utilization of many-valued logics can enrich the investigations of protein molecules, in which uncertainty and imprecision are prevalent problems. The paper discusses all observed benefits brought by the application of many-valued logics in investigations related to selected protein analyzes carried out by the author.

[1]  Liisa Holm,et al.  Sensitive pattern discovery with 'fuzzy' alignments of distantly related proteins , 2003, ISMB.

[2]  Dariusz Mrozek,et al.  Energy Properties of Protein Structures in the Analysis of the Human RAB5A Cellular Activity , 2009, ICMMI.

[3]  D. Mrozek,et al.  Energy profiles in detection of protein structure modifications , 2006, 2006 International Conference on Computing & Informatics.

[4]  Alexander S. Rose,et al.  NGL Viewer: a web application for molecular visualization , 2015, Nucleic Acids Res..

[5]  F. Rodriguez,et al.  Simulating complex traits influenced by genes with fuzzy-valued effects in pedigreed populations , 2003, Bioinform..

[6]  T. Aruldoss Albert Victoire,et al.  Hybrid Ant Bee Algorithm for Fuzzy Expert System Based Sample Classification , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  A. Hamdi-Cherif,et al.  State of the Art of Fuzzy Methods for Gene Regulatory Networks Inference , 2015, TheScientificWorldJournal.

[8]  Sanghamitra Bandyopadhyay,et al.  An efficient technique for superfamily classification of amino acid sequences: feature extraction, fuzzy clustering and prototype selection , 2005, Fuzzy Sets Syst..

[9]  Sara Nasser,et al.  Multiple Sequence Alignment using Fuzzy Logic , 2007, 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[10]  Kuo-Chen Chou,et al.  Using supervised fuzzy clustering to predict protein structural classes. , 2005, Biochemical and biophysical research communications.

[11]  Kazem Sadegh-Zadeh,et al.  The fuzzy polynucleotide space revisited , 2007, Artif. Intell. Medicine.

[12]  Eyke Hüllermeier,et al.  Similarity measures for protein structures based on fuzzy histogram comparison , 2010, International Conference on Fuzzy Systems.

[13]  Ujjwal Maulik,et al.  Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Dariusz Mrozek,et al.  An Improved Method for Protein Similarity Searching by Alignment of Fuzzy Energy Signatures , 2011, Int. J. Comput. Intell. Syst..

[15]  Monika Fuxreiter,et al.  Fuzziness in Protein Interactions-A Historical Perspective. , 2018, Journal of molecular biology.

[16]  B. Vallone,et al.  DEOXYHEMOGLOBIN T38W (ALPHA CHAINS), V1G (ALPHA AND BETA CHAINS) , 1996 .

[17]  Nasser Ghadiri,et al.  A Type-2 fuzzy data fusion approach for building reliable weighted protein interaction networks with application in protein complex detection , 2017, Comput. Biol. Medicine.

[18]  Hailong Hu,et al.  Protein secondary structure prediction based on the fuzzy support vector machine with the hyperplane optimization. , 2018, Gene.

[19]  Yanqing Zhang,et al.  Recursive Fuzzy Granulation for Gene Subsets Extraction and Cancer Classification , 2008, IEEE Transactions on Information Technology in Biomedicine.

[20]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[21]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[22]  Andrea Bellelli,et al.  Probing the α1β2 Interface of Human Hemoglobin by Mutagenesis , 1996, The Journal of Biological Chemistry.

[23]  Keith C. C. Chan,et al.  Incremental Fuzzy Mining of Gene Expression Data for Gene Function Prediction , 2011, IEEE Transactions on Biomedical Engineering.

[24]  Ping-Teng Chang,et al.  Protein Sequence Alignment Based on Fuzzy Arithmetic and Genetic Algorithm , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[25]  M. Perutz,et al.  The crystal structure of human deoxyhaemoglobin at 1.74 A resolution. , 1984, Journal of molecular biology.

[26]  Tshilidzi Marwala,et al.  Multi-class Protein Sequence Classification Using Fuzzy ARTMAP , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[27]  Bartosz Ziólko Fuzzy precision and recall measures for audio signals segmentation , 2015, Fuzzy Sets Syst..

[28]  Juan J. Nieto,et al.  The fuzzy polynucleotide space: basic properties , 2003, Bioinform..

[29]  Amit Bhaya,et al.  Evolving fuzzy rules to model gene expression , 2007, Biosyst..

[30]  Witold Pedrycz,et al.  ANFIS-based fuzzy systems for searching dna-protein binding sites , 2016 .

[31]  Yi Pan,et al.  Multiclass Fuzzy Clustering Support Vector Machines for Protein Local Structure Prediction , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[32]  Dariusz Mrozek,et al.  PSS-SQL: Protein Secondary Structure - Structured Query Language , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[33]  Joseph A. Goguen,et al.  What is a Logic , 2007 .

[34]  Kan Li,et al.  Detecting overlapping protein complexes in dynamic protein-protein interaction networks by developing a fuzzy clustering algorithm , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[35]  Rency S Varghese,et al.  Increasing the efficiency of fuzzy logic-based gene expression data analysis. , 2003, Physiological genomics.

[36]  Saman K. Halgamuge,et al.  Protein motif extraction with neuro-fuzzy optimization , 2002, Bioinform..

[37]  Ujjwal Maulik,et al.  Fuzzy SVM with a Novel Membership Function for Prediction of Protein-Protein Interaction Sites in Homo sapiens , 2013, PReMI.

[38]  José L. Verdegay,et al.  Applying a fuzzy sets‐based heuristic to the protein structure prediction problem , 2002, Int. J. Intell. Syst..

[39]  Gerald Schaefer,et al.  Data Mining of Gene Expression Data by Fuzzy and Hybrid Fuzzy Methods , 2010, IEEE Transactions on Information Technology in Biomedicine.

[40]  Shulin Wang,et al.  Identification of overlapping protein complexes by fuzzy K-medoids clustering algorithm in yeast protein-protein interaction networks , 2017, J. Intell. Fuzzy Syst..

[41]  T. Gibson,et al.  Protein disorder prediction: implications for structural proteomics. , 2003, Structure.

[42]  Alina Momot,et al.  On Using Energy Signatures in Protein Structure Similarity Searching , 2006, ICAISC.

[43]  Dariusz Mrozek,et al.  An efficient and flexible scanning of databases of protein secondary structures , 2014, Journal of Intelligent Information Systems.

[44]  Dariusz Mrozek,et al.  CASSERT: A Two-Phase Alignment Algorithm for Matching 3D Structures of Proteins , 2013, CN.

[45]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[46]  Leonard Bolc,et al.  Many-Valued Logics 2 , 2004 .

[47]  Chitta Baral,et al.  Fuzzy C-means Clustering with Prior Biological Knowledge , 2022 .

[48]  T E Karakasidis,et al.  A study of entropy/clarity of genetic sequences using metric spaces and fuzzy sets. , 2010, Journal of theoretical biology.

[49]  Chih-Hung Hsieh,et al.  Interpretable gene expression classifier with an accurate and compact fuzzy rule base for microarray data analysis. , 2006, Bio Systems.

[50]  Dariusz Mrozek,et al.  Protein Comparison by the Alignment of Fuzzy Energy Signatures , 2009, RSKT.

[51]  SaravananVijayakumar,et al.  Fuzzy logic for personalized healthcare and diagnostics: FuzzyApp--a fuzzy logic based allergen-protein predictor. , 2014 .

[52]  A. Bezerianos,et al.  Gene networks reconstruction and time-series prediction from microarray data using recurrent neural fuzzy networks. , 2007, IET systems biology.

[53]  Saeid Nahavandi,et al.  Structural classification of proteins through amino acid sequence using interval type-2 fuzzy logic system , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[54]  Yehoshua Bar-Hillel,et al.  Foundations of Set Theory [by] Abraham A. Fraenkel, Yehoshua Bar-Hillel [and] Azriel Levy. With the Collaboration of Dirk van Dalen. -- , 1973 .

[55]  Saman K. Halgamuge,et al.  Approximate symbolic pattern matching for protein sequence data , 2003, Int. J. Approx. Reason..

[56]  Mattias Ohlsson,et al.  Matching protein structures with fuzzy alignments , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Dariusz Mrozek,et al.  Spark-IDPP: high-throughput and scalable prediction of intrinsically disordered protein regions with Spark clusters on the Cloud , 2018, Cluster Computing.

[58]  Nabil Belacel,et al.  Fuzzy J-Means and VNS methods for clustering genes from microarray data , 2004, Bioinform..

[59]  Efendi N. Nasibov,et al.  Protein subcellular location prediction using optimally weighted fuzzy k-NN algorithm , 2008, Comput. Biol. Chem..

[60]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[61]  Allam Appa Rao,et al.  Evolutionary Neuro-Fuzzy System for Protein Secondary Structure Prediction , 2013 .

[62]  Dariusz Mrozek,et al.  EAST: Energy Alignment Search Tool , 2006, FSKD.

[63]  Dariusz Mrozek,et al.  An optimal alignment of proteins energy characteristics with crisp and fuzzy similarity awards , 2007, 2007 IEEE International Fuzzy Systems Conference.

[64]  M. Teeter,et al.  Crystal Structure of Ser-22/Ile-25 Form Crambin Confirms Solvent, Side Chain Substate Correlations* , 1997, The Journal of Biological Chemistry.

[65]  Cheng Liang,et al.  PCE-FR: A Novel Method for Identifying Overlapping Protein Complexes in Weighted Protein-Protein Interaction Networks Using Pseudo-Clique Extension Based on Fuzzy Relation , 2016, IEEE Transactions on NanoBioscience.

[66]  Robert B. Russell,et al.  GlobPlot: exploring protein sequences for globularity and disorder , 2003, Nucleic Acids Res..

[67]  Dariusz Mrozek,et al.  Soft and Declarative Fishing of Information in Big Data Lake , 2018, IEEE Transactions on Fuzzy Systems.

[68]  Changiz Eslahchi,et al.  HELIX SEGMENT ASSIGNMENT IN PROTEINS USING FUZZY LOGIC , 2007 .

[69]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..

[70]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[71]  Dirk Labudde,et al.  eProS—a database and toolbox for investigating protein sequence–structure–function relationships through energy profiles , 2013, Nucleic Acids Res..

[72]  Seo Young Kim,et al.  Effect of data normalization on fuzzy clustering of DNA microarray data , 2005, BMC Bioinformatics.

[73]  Dianhui Wang,et al.  Extraction and Optimization of Fuzzy Protein Sequences Classification Rules Using GRBF Neural Networks , 2003 .

[74]  S. Katebi,et al.  Protein Superfamily Classification Using Fuzzy Rule-Based Classifier , 2009, IEEE Transactions on NanoBioscience.

[75]  Dariusz Mrozek,et al.  Alignment of protein structure energy patterns represented as sequences of Fuzzy Numbers , 2009, NAFIPS 2009 - 2009 Annual Meeting of the North American Fuzzy Information Processing Society.

[76]  Mattias Ohlsson,et al.  A Fuzzy Matching Approach to Multiple Structure Alignment of Proteins , 2003 .

[77]  James M. Keller,et al.  Applications of Fuzzy Logic in Bioinformatics , 2008, Series on Advances in Bioinformatics and Computational Biology.

[78]  John D Westbrook,et al.  The PDB format, mmCIF, and other data formats. , 2003, Methods of biochemical analysis.

[79]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[80]  Bożena Małysiak-Mrozek,et al.  Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA , 2014, Journal of Molecular Modeling.

[81]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[82]  N. Rescher Many Valued Logic , 1969 .

[83]  Edmund K. Burke,et al.  A fuzzy sets based generalization of contact maps for the overlap of protein structures , 2005, Fuzzy Sets Syst..

[84]  N R Kallenbach,et al.  Helix capping in the GCN4 leucine zipper. , 1999, Journal of molecular biology.

[85]  Vasile Palade,et al.  Building interpretable fuzzy models for high dimensional data analysis in cancer diagnosis , 2011, BMC Genomics.

[86]  S. I. Ahson,et al.  A New Approach for Modelling Gene Regulatory Networks Using Fuzzy Petri Nets , 2010, J. Integr. Bioinform..

[87]  L. Zadeh A COMPUTATIONAL APPROACH TO FUZZY QUANTIFIERS IN NATURAL LANGUAGES , 1983 .

[88]  Natalio Krasnogor,et al.  Protein Structure Comparison through Fuzzy Contact Maps and the Universal Similarity Metric , 2005, EUSFLAT Conf..

[89]  P. S. Kim,et al.  Side-chain repacking calculations for predicting structures and stabilities of heterodimeric coiled coils , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[90]  Taizo Hanai,et al.  Analysis of expression profile using fuzzy adaptive resonance theory , 2002, Bioinform..

[91]  Quan Zou,et al.  O‐GlcNAcPRED‐II: an integrated classification algorithm for identifying O‐GlcNAcylation sites based on fuzzy undersampling and a K‐means PCA oversampling technique , 2018, Bioinform..

[92]  P. Woolf,et al.  A fuzzy logic approach to analyzing gene expression data. , 2000, Physiological genomics.

[93]  T E Karakasidis,et al.  Fuzzy polynucleotide spaces and metrics , 2006, Bulletin of mathematical biology.

[94]  D. Mundici,et al.  Algebraic Foundations of Many-Valued Reasoning , 1999 .

[95]  Zheng Rong Yang,et al.  RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins , 2005, Bioinform..

[96]  Maximilian Schlosshauer,et al.  A novel approach to local reliability of sequence alignments , 2002, Bioinform..

[97]  Maria-Iuliana Bocicor,et al.  Dynamic Clustering of Gene Expression Data Using a Fuzzy Approach , 2014, 2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[98]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[99]  L. Bolc,et al.  Many-Valued Logics , 1992 .