Genus-wide Bacillus species identification through proper artificial neural network experiments on fatty acid profiles

Gas chromatographic fatty acid methyl ester analysis of bacteria is an easy, cheap and fast-automated identification tool routinely used in microbiological research. This paper reports on the application of artificial neural networks for genus-wide FAME-based identification of Bacillus species. Using 1,071 FAME profiles covering a genus-wide spectrum of 477 strains and 82 species, different balanced and imbalanced data sets have been created according to different validation methods and model parameters. Following training and validation, each classifier was evaluated on its ability to identify the profiles of a test set. Comparison of the classifiers showed a good identification rate favoring the imbalanced data sets. The presence of the Bacilluscereus and Bacillus subtilis groups made clear that it is of great importance to take into account the limitations of FAME analysis resolution for the construction of identification models. Indeed, as members of such a group cannot easily be distinguished from one another based upon FAME data alone, identification models built upon this data can neither be successful at keeping them apart. Comparison of the different experimental setups ultimately led to a few general recommendations. With respect to the routinely used commercial Sherlock Microbial Identification System (MIS, Microbial ID, Inc. (MIDI), Newark, Delaware, USA), the artificial neural network test results showed a significant improvement in Bacillus species identification. These results indicate that machine learning techniques such as artificial neural networks are most promising tools for FAME-based classification and identification of bacterial species.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[4]  P. de Vos,et al.  Polyphasic Taxonomy , a Consensus Approach to Bacterial Systematics , 1996 .

[5]  P. Harrington,et al.  Validation using sensitivity and target transform factor analyses of neural network models for classifying bacteria from mass spectra , 2002, Journal of the American Society for Mass Spectrometry.

[6]  John J. Kelly,et al.  Use of 16S rRNA, 23S rRNA, and gyrB Gene Sequence Analysis To Determine Phylogenetic Relationships of Bacillus cereus Group Microorganisms , 2004, Journal of Clinical Microbiology.

[7]  Mauro Giacomini,et al.  An Advanced Approach Based on Artificial Neural Networks to Identify Environmental Bacteria , 2007 .

[8]  T. Hadfield,et al.  Repeatability and pattern recognition of bacterial fatty acid profiles generated by direct mass spectrometric analysis of in situ thermal hydrolysis/methylation of whole cells. , 2003, Talanta.

[9]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[10]  Graham R. Ball,et al.  Classification of bacterial species from proteomic data using combinatorial approaches incorporating artificial neural networks, cluster analysis and principal components analysis , 2005, Bioinform..

[11]  M. Collins,et al.  Phylogenetic heterogeneity of the genus Bacillus revealed by comparative analysis of small‐subunit‐ribosomal RNA sequences , 1991 .

[12]  P. Vos,et al.  Applications and systematics of Bacillus and relatives. , 2002 .

[13]  John J. Kelly,et al.  Use of 16S rRNA, 23S rRNA, and gyrB Gene Sequence Analysis To Determine Phylogenetic Relationships of Bacillus cereus Group Microorganisms , 2006, Journal of Clinical Microbiology.

[14]  M. Madigan,et al.  Brock Biology of Microorganisms , 1996 .

[15]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[16]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[17]  W. Whitman,et al.  Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. , 2002, International journal of systematic and evolutionary microbiology.

[18]  Bernard De Baets,et al.  Mining fatty acid databases for detection of novel compounds in aerobic bacteria. , 2006, Journal of microbiological methods.

[19]  D. Mouwen,et al.  Artificial neural network based identification of Campylobacter species by Fourier transform infrared spectroscopy. , 2006, Journal of microbiological methods.

[20]  R. Berti,et al.  Interpretation of gascromatographic data via artificial neural networks for the classification of marine bacteria , 2004, Cytotechnology.

[21]  Peter Kämpfer,et al.  Limits and Possibilities of Total Fatty Acid Analysis for Classification and Identification of Bacillus Species , 1994 .

[22]  K. Venkateswaran,et al.  Bacillus tequilensis sp. nov., isolated from a 2000-year-old Mexican shaft-tomb, is closely related to Bacillus subtilis. , 2006, International journal of systematic and evolutionary microbiology.

[23]  K. Abel,et al.  CLASSIFICATION OF MICROORGANISMS BY ANALYSIS OF CHEMICAL COMPOSITION I , 1963, Journal of bacteriology.

[24]  F. Drobniewski,et al.  Bacillus cereus and related species , 1993, Clinical Microbiology Reviews.

[25]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[26]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[27]  D B Kell,et al.  Rapid identification of urinary tract infection bacteria using hyperspectral whole-organism fingerprinting and artificial neural networks. , 1998, Microbiology.

[28]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[29]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[30]  Bacterial Identification by Gas Chromatograph ic Analysis of Fatty Acids Methyl Esters ( GC-FAME ) , 2001 .

[31]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[32]  Peter Dawyndt,et al.  Raman microspectroscopy as an identification tool within the phylogenetically homogeneous 'Bacillus subtilis' group. , 2006, Systematic and applied microbiology.

[33]  M Giacomini,et al.  Artificial neural network based identification of environmental bacteria by gas-chromatographic and electrophoretic data. , 2000, Journal of microbiological methods.

[34]  A. Kolstø,et al.  The Bacillus cereus group: novel aspects of population structure and genome dynamics , 2006, Journal of applied microbiology.

[35]  Lawrence G. Wayne,et al.  International Committee on Systematic Bacteriology: Announcement of the Report of the Ad Hoc Committee on Reconciliation of Approaches to Bacterial Systematics , 1988 .

[36]  L. Miller Single derivatization method for routine analysis of bacterial whole-cell fatty acid methyl esters, including hydroxy acids , 1982, Journal of clinical microbiology.

[37]  M. Sasser,et al.  1 1 IDENTIFICATION OF MICROORGANISMS USING FATTY ACID METHYL ESTER ( FAME ) ANALYSIS AND THE MIDI SHERLOCK ® MICROBIAL IDENTIFICATION SYSTEM , 2006 .

[38]  C. Ruggiero,et al.  Automated systems for identification of heterotrophic marine bacteria on the basis of their Fatty Acid composition , 1996, Applied and environmental microbiology.

[39]  Erko Stackebrandt,et al.  Taxonomic Note: A Place for DNA-DNA Reassociation and 16S rRNA Sequence Analysis in the Present Species Definition in Bacteriology , 1994 .

[40]  J. Euzéby List of prokaryotic names with standing in nomenclature-genus Roseomonas , 2006 .