On the Optimization of Embedding Spaces via Information Granulation for Pattern Recognition

Embedding spaces are one of the mainstream approaches when dealing with structured data. Granular Computing, in the last decade, emerged as a powerful paradigm for the automatic synthesis of embedding spaces that, at the same time, yield an interpretable model on the top of meaningful entities known as "information granules". Usually, in these contexts, one aims at finding the smallest set of information granules in order to boost the model interpretability while keeping satisfactory performances. In this paper, we add a third objective, namely the structural complexity of the resulting model and we exploit three biology-related case studies related to metabolic networks and protein networks in order to investigate the link between classification performances, embedding space dimensionality and structural complexity of the resulting model.

[1]  A. Giuliani,et al.  Protein contact networks: an emerging paradigm in chemistry. , 2013, Chemical reviews.

[2]  E. Webb Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .

[3]  Antonello Rizzi,et al.  Information Granules Filtering for Inexact Sequential Pattern Mining by Evolutionary Computation , 2014, IJCCI.

[4]  P. Legendre,et al.  SPECIES ASSEMBLAGES AND INDICATOR SPECIES:THE NEED FOR A FLEXIBLE ASYMMETRICAL APPROACH , 1997 .

[5]  Antonello Rizzi,et al.  Stochastic Information Granules Extraction for Graph Embedding and Classification , 2019, IJCCI.

[6]  Takuya Ueda,et al.  Cell-free translation reconstituted with purified components , 2001, Nature Biotechnology.

[7]  Andrzej Bargiela,et al.  Toward a Theory of Granular Computing for Human-Centered Information Processing , 2008, IEEE Transactions on Fuzzy Systems.

[8]  Alessandro Giuliani,et al.  Spectral reconstruction of protein contact networks , 2017 .

[9]  Alessandro Giuliani,et al.  The Universal Phenotype , 2019 .

[10]  Antonello Rizzi,et al.  Dissimilarity Space Representations and Automatic Feature Selection for Protein Function Prediction , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[11]  Alessandro Giuliani,et al.  Supervised Approaches for Function Prediction of Proteins Contact Networks from Topological Structure Information , 2017, SCIA.

[12]  A. Dillmann Enzyme Nomenclature , 1965, Nature.

[13]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[14]  A. Giuliani,et al.  Granular Computing Techniques for Bioinformatics Pattern Recognition Problems in Non-metric Spaces , 2018 .

[15]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[16]  Andrzej Bargiela,et al.  The roots of granular computing , 2006, 2006 IEEE International Conference on Granular Computing.

[17]  Antonello Rizzi,et al.  Supervised Approaches for Protein Function Prediction by Topological Data Analysis , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[18]  Danijela Horak,et al.  Persistent homology of complex networks , 2008, 0811.2203.

[19]  Alessandro Giuliani,et al.  (Hyper)Graph Embedding and Classification via Simplicial Complexes , 2019, Algorithms.

[20]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[21]  A. Rizzi,et al.  Automatic Image Classification by a Granular Computing Approach , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[22]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[23]  Alessandro Giuliani,et al.  Metabolic networks classification and knowledge discovery by information granulation , 2019, Comput. Biol. Chem..

[24]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[25]  Charlotte M. Deane,et al.  Exploring Folding Features in Protein Structure Prediction , 2018 .

[26]  Antonello Rizzi,et al.  Noise Sensitivity of an Information Granules Filtering Procedure by Genetic Optimization for Inexact Sequential Pattern Mining , 2014, IJCCI.

[27]  Alessandro Giuliani,et al.  Metabolic pathways variability and sequence/networks comparisons , 2006, BMC Bioinformatics.

[28]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[29]  Antonello Rizzi,et al.  Online Handwriting Recognition by the Symbolic Histograms Approach , 2007, 2007 IEEE International Conference on Granular Computing (GRC 2007).

[30]  Shoji Takada,et al.  Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins , 2009, Proceedings of the National Academy of Sciences.

[31]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[32]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[33]  Danielle S. Bassett,et al.  Two’s company, three (or more) is a simplex , 2016, Journal of Computational Neuroscience.

[34]  H. Bandelt,et al.  Metric graph theory and geometry: a survey , 2006 .

[35]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .