Supervised machine learning-based classification of oral malodor based on the microbiota in saliva samples

OBJECTIVE This study presents an effective method of classifying oral malodor from oral microbiota in saliva by using a support vector machine (SVM), an artificial neural network (ANN), and a decision tree. This approach uses concentrations of methyl mercaptan in mouth air as an indicator of oral malodor, and peak areas of terminal restriction fragment (T-RF) length polymorphisms (T-RFLPs) of the 16S rRNA gene as data for supervised machine-learning methods, without identifying specific species producing oral malodorous compounds. METHODS 16S rRNA genes were amplified from saliva samples from 309 subjects, and T-RFLP analysis was carried out with the DNA fragments. T-RFLP analysis provides information on microbiota consisting of fragment lengths and peak areas corresponding to bacterial strains. The peak area is equivalent to the frequency of a specific fragment when one molecule is selected from terminal fragments. Another frequency is obtained by dividing the number of species-containing samples by the total number of samples. An SVM, an ANN, and a decision tree were trained based on these two frequencies in 308 samples and classified the presence or absence of methyl mercaptan in mouth air from the remaining subject. RESULTS The proportion that trained SVM expressed as entropy achieved the highest classification accuracy, with a sensitivity of 51.1% and specificity of 95.0%. The ANN and decision tree provided lower classification accuracies, and only classification by the ANN was improved by weighting with entropy from the frequency of appearance in samples, which increased the accuracy to 81.9% with a sensitivity of 60.2% and a specificity of 90.5%. The decision tree showed low classification accuracy under all conditions. CONCLUSIONS Using T-RF proportions and frequencies, models to classify the presence of methyl mercaptan, a volatile sulfur-containing compound that causes oral malodor, were developed. SVM classifiers successfully classified the presence of methyl mercaptan with high specificity, and this classification is expected to be useful for screening saliva for oral malodor before visits to specialist clinics. Classification by a SVM and an ANN does not require the identification of the oral microbiota species responsible for the malodor, and the ANN also does not require the proportions of T-RFs.

[1]  J. A. Aas,et al.  The breadth of bacterial diversity in the human periodontal pocket and other oral sites. , 2006, Periodontology 2000.

[2]  Yoshio Nakano,et al.  TRFMA: a web-based tool for terminal restriction fragment length polymorphism analysis based on molecular weight , 2006, Bioinform..

[3]  P. Johnson,et al.  Exposure of Periodontal Ligament Cells to Methyl Mercaptan Reduces Intracellular pH and Inhibits Cell Migration , 1996, Journal of dental research.

[4]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[5]  T. Marsh Culture-independent microbial community analysis with terminal restriction fragment length polymorphism. , 2005, Methods in enzymology.

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[8]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[9]  Hans H. Cheng,et al.  Characterization of microbial diversity by determining terminal restriction fragment length polymorphisms of genes encoding 16S rRNA , 1997, Applied and environmental microbiology.

[10]  T. Takeshita,et al.  Improved accuracy in terminal restriction fragment length polymorphism phylogenetic analysis using a novel internal size standard definition. , 2007, Oral microbiology and immunology.

[11]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[12]  R. Schulin,et al.  Changes in lead availability affect bacterial community structure but not basal respiration in a microcosm study with forest soils. , 2006, The Science of the total environment.

[13]  T Takehara,et al.  Correlation between volatile sulphur compounds and certain oral health measurements in the general population. , 1995, Journal of periodontology.

[14]  T. Koga,et al.  Methyl mercaptan production by periodontal bacteria. , 2002, International dental journal.

[15]  Donna M. Rizzo,et al.  Subsurface characterization of groundwater contaminated by landfill leachate using microbial community profile data and a nonparametric decision‐making process , 2011 .

[16]  J. Greenman,et al.  What to do about halitosis , 1994, BMJ.

[17]  Jie Yang,et al.  Support Vector Machine In Chemistry , 2004 .

[18]  J. Tonzetich,et al.  Reduction of malodor by oral cleansing procedures. , 1976, Oral surgery, oral medicine, and oral pathology.

[19]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[20]  Kyong Joo Oh,et al.  Using decision tree to develop a soil ecological quality assessment system for planning sustainable construction , 2011, Expert Syst. Appl..

[21]  J. Tonzetich,et al.  Effect of Hydrogen Sulfide and Methyl Mercaptan on the Permeability of Oral Mucosa , 1984, Journal of dental research.

[22]  M. Hartmann,et al.  Ranking the magnitude of crop and farming system effects on soil microbial biomass and genetic structure of bacterial communities. , 2006, FEMS microbiology ecology.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[25]  J. Greenman,et al.  Breath odor: etiopathogenesis, assessment and management. , 1997, European journal of oral sciences.

[26]  J. A. Aas,et al.  Defining the Normal Bacterial Flora of the Oral Cavity , 2005, Journal of Clinical Microbiology.

[27]  T. Koga,et al.  Formation of Methyl Mercaptan froml-Methionine by Porphyromonas gingivalis , 2000, Infection and Immunity.

[28]  W E Moore,et al.  The bacteria of periodontal diseases. , 1994, Periodontology 2000.

[29]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[30]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.