Data Mining Techniques Applied to Hydrogen Lactose Breath Test

In this work, we present the results of applying data mining techniques to hydrogen breath test data. Disposal of H2 gas is of utmost relevance to maintain efficient microbial fermentation processes. Objectives Analyze a set of data of hydrogen breath tests by use of data mining tools. Identify new patterns of H2 production. Methods Hydrogen breath tests data sets as well as k-means clustering as the data mining technique to a dataset of 2571 patients. Results Six different patterns have been extracted upon analysis of the hydrogen breath test data. We have also shown the relevance of each of the samples taken throughout the test. Conclusions Analysis of the hydrogen breath test data sets using data mining techniques has identified new patterns of hydrogen generation upon lactose absorption. We can see the potential of application of data mining techniques to clinical data sets. These results offer promising data for future research on the relations between gut microbiota produced hydrogen and its link to clinical symptoms.

[1]  J. Gower Euclidean Distance Geometry , 1982 .

[2]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[3]  R. Gangnon,et al.  Sex-related differences in pulmonary physiologic outcome measures in a high-risk birth cohort. , 2015, The Journal of allergy and clinical immunology.

[4]  E. Thévenot,et al.  Analysis of the Human Adult Urinary Metabolome Variations with Age, Body Mass Index, and Gender by Implementing a Comprehensive Workflow for Univariate and OPLS Statistical Analyses. , 2015, Journal of proteome research.

[5]  D. Madigan,et al.  Machine learning and data mining: strategies for hypothesis generation , 2012, Molecular Psychiatry.

[6]  A. Gasbarrini,et al.  Lactose intolerance: from diagnosis to correct management. , 2013, European review for medical and pharmacological sciences.

[7]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[10]  M. Simrén,et al.  Use and abuse of hydrogen breath tests , 2006, Gut.

[11]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[12]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[13]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[16]  Francisco Martinez Alvarez,et al.  Energy Time Series Forecasting Based on Pattern Sequence Similarity , 2011, IEEE Transactions on Knowledge and Data Engineering.

[17]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[18]  Henry C. Lin,et al.  Hydrogen Sulfide in Physiology and Diseases of the Digestive Tract , 2015, Microorganisms.

[19]  U. Ghoshal How to Interpret Hydrogen Breath Tests , 2011, Journal of neurogastroenterology and motility.

[20]  A. Newman Breath-analysis tests in gastroenterology. , 1974, Gut.

[21]  M. Robinson,et al.  Microbiota-derived hydrogen fuels Salmonella typhimurium invasion of the gut ecosystem. , 2013, Cell host & microbe.

[22]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[23]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[24]  M. Ledochowski,et al.  Implementation and interpretation of hydrogen breath tests , 2008, Journal of breath research.

[25]  N. Hilzenrat,et al.  Comparison of a real-time polymerase chain reaction assay for lactase genetic polymorphism with standard indirect tests for lactose maldigestion. , 2007, Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association.

[26]  H. Büller,et al.  Lactose intolerance , 1991, The Lancet.

[27]  J. Romagnuolo,et al.  Using breath tests wisely in a gastroenterology practice: an evidence-based review of indications and pitfalls in interpretation , 2002, American Journal of Gastroenterology.

[28]  H. Gaskins,et al.  Abundance and diversity of mucosa-associated hydrogenotrophic microbes in the healthy human colon , 2011, The ISME Journal.

[29]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[30]  E. Wouters,et al.  Development of accurate classification method based on the analysis of volatile organic compounds from human exhaled air. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[31]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[32]  A. Gasbarrini,et al.  Methodology and Indications of H2‐Breath Testing in Gastrointestinal Diseases: the Rome Consensus Conference , 2009, Alimentary pharmacology & therapeutics.

[33]  A Depeursinge,et al.  Clinical Data Mining: a Review , 2009, Yearbook of Medical Informatics.

[34]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.