论文信息 - BioCluster: Tool for Identification and Clustering of Enterobacteriaceae Based on Biochemical Data

BioCluster: Tool for Identification and Clustering of Enterobacteriaceae Based on Biochemical Data

Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1–47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/.

[1] Amos Storkey,et al. Advances in Neural Information Processing Systems 20 , 2007 .

[2] N. Krieg,et al. Bergey's manual of systematic bacteriology. Vol. I. , 1984 .

[3] M. Hinton,et al. Enterobacteriaceae associated with animals in health and disease. , 1988, Society for Applied Bacteriology symposium series.

[4] Peter N. Yianilos. Metric Learning via Normal Mixtures , 2007 .

[5] S. C. Johnson. Hierarchical clustering schemes , 1967, Psychometrika.

[6] W. Traub,et al. Identification of Enterobacteriaceae in the Clinical Microbiology Laboratory , 1970, Applied microbiology.

[7] Jonathan Baxter,et al. The Canonical Distortion Measure for Vector Quantization and Function Approximation , 1997, ICML.

[8] Daniel T. Larose,et al. Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[9] J. Cruickshank. Bergey's Manual , 1949 .

[10] T. Hansen. Bergey's Manual of Systematic Bacteriology , 2005 .

[11] Pang-Ning Tan,et al. Introduction To Data Mining”, Person Education, 2007 , 2015 .

[12] B. Bochner. Global phenotypic characterization of bacteria , 2008, FEMS microbiology reviews.

[13] M. Basseville. Distance measures for signal processing and pattern recognition , 1989 .

[14] T. Minka. Distance measures as prior probabilities , 2000 .

[15] G. Garrity. Bergey’s Manual® of Systematic Bacteriology , 2012, Springer New York.

[16] F. A. Skinner,et al. Identification methods for microbiologists , 1979 .