Clustering Bacteria Species Using Neural Gas: Preliminary Study

In this work a method for clustering and visualization of bacteria taxonomy is presented. A modified version of the Batch Median Neural Gas (BNG) algorithm is proposed. The BNG algorithm is able to manage non vectorial data given as a dissimilarity matrix. We tested the modified BNG on the dissimilarity matrix obtained from sequences alignment and computing distances using bacteria genotype information regarding the16S rRNA housekeeping gene, which represents a stable part of bacteria genome. The dataset used for the experiments is obtained from the Ribosomal Database Project II, and it is made of 5159 sequences of 16S rRNA genes. Preliminary results of the experiments show a promising ability of the proposed algorithm to recognize clusters of the actual bacteria taxonomy.