Accurately Inferring Compliance to Five Major Food Guidelines Through Simplified Surveys: Applying Data Mining to the UK National Diet and Nutrition Survey

Background National surveys in public health nutrition commonly record the weight of every food consumed by an individual. However, if the goal is to identify whether individuals are in compliance with the 5 main national nutritional guidelines (sodium, saturated fats, sugars, fruit and vegetables, and fats), much less information may be needed. A previous study showed that tracking only 2.89% of all foods (113/3911) was sufficient to accurately identify compliance. Further reducing the data needs could lower participation burden, thus decreasing the costs for monitoring national compliance with key guidelines. Objective This study aimed to assess whether national public health nutrition surveys can be further simplified by only recording whether a food was consumed, rather than having to weigh it. Methods Our dataset came from a generalized sample of inhabitants in the United Kingdom, more specifically from the National Diet and Nutrition Survey 2008-2012. After simplifying food consumptions to a binary value (1 if an individual consumed a food and 0 otherwise), we built and optimized decision trees to find whether the foods could accurately predict compliance with the major 5 nutritional guidelines. Results When using decision trees of a similar size to previous studies (ie, involving as many foods), we were able to correctly infer compliance for the 5 guidelines with an average accuracy of 80.1%. This is an average increase of 2.5 percentage points over a previous study, showing that further simplifying the surveys can actually yield more robust estimates. When we allowed the new decision trees to use slightly more foods than in previous studies, we were able to optimize the performance with an average increase of 3.1 percentage points. Conclusions Although one may expect a further simplification of surveys to decrease accuracy, our study found that public health dietary surveys can be simplified (from accurately weighing items to simply checking whether they were consumed) while improving accuracy. One possibility is that the simplification reduced noise and made it easier for patterns to emerge. Using simplified surveys will allow to monitor public health nutrition in a more cost-effective manner and possibly decrease the number of errors as participation burden is reduced.

[1]  Rosalind S Gibson,et al.  Measurement Errors in Dietary Assessment Using Self-Reported 24-Hour Recalls in Low-Income Countries and Strategies for Their Prevention. , 2017, Advances in nutrition.

[2]  Rik Crutzen,et al.  Identifying binge drinkers based on parenting dimensions and alcohol-specific parenting practices: building classifiers on adolescent-parent paired data , 2014, BMC Public Health.

[3]  Rik Crutzen,et al.  Using Classifiers to Identify Binge Drinkers Based on Drinking Motives , 2014, Substance use & misuse.

[4]  F. Hu,et al.  Fruit and vegetable consumption and mortality from all causes, cardiovascular disease, and cancer: systematic review and dose-response meta-analysis of prospective cohort studies , 2014, BMJ : British Medical Journal.

[5]  P. van’t Veer,et al.  Design characteristics of food frequency questionnaires in relation to their validity. , 2007, American journal of epidemiology.

[6]  Rik Crutzen,et al.  An Agent-Based Social Network Model of Binge Drinking Among Dutch Adults , 2013, J. Artif. Soc. Soc. Simul..

[7]  L. Schwingshackl,et al.  Diet quality as assessed by the Healthy Eating Index, the Alternate Healthy Eating Index, the Dietary Approaches to Stop Hypertension score, and health outcomes: a systematic review and meta-analysis of cohort studies. , 2015, Journal of the Academy of Nutrition and Dietetics.

[8]  H. Boeing,et al.  Portion size adds limited information on variance in food intake of participants in the EPIC-Potsdam study. , 2003, The Journal of nutrition.

[9]  Philippe J. Giabbanelli,et al.  An Algebraic Approach to Combining Classifiers , 2015, ICCS.

[10]  G. Jennings,et al.  INERTIA OR INACTION? BLOOD PRESSURE MANAGEMENT AND CARDIOVASCULAR RISK IN DIABETES ‡ , 2009, Clinical and experimental pharmacology & physiology.

[11]  Lior Rokach,et al.  Data Mining with Decision Trees - Theory and Applications. 2nd Edition , 2013, Series in Machine Perception and Artificial Intelligence.

[12]  Gary M. Weiss,et al.  Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? , 2007, DMIN.

[13]  W. James,et al.  A life course approach to diet, nutrition and the prevention of chronic diseases , 2004, Public Health Nutrition.

[14]  Robert Steele,et al.  Clinical decision rules for secondary trauma triage: predictors of emergency operative management. , 2006, Annals of emergency medicine.

[15]  R. A. Mollineda,et al.  The class imbalance problem in pattern classification and learning , 2009 .

[16]  Hugh Ellis,et al.  Exploring the forest instead of the trees: An innovative method for defining obesogenic and obesoprotective environments. , 2015, Health & place.

[17]  A. Kant,et al.  Indexes of overall diet quality: a review. , 1996, Journal of the American Dietetic Association.

[18]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[19]  Jeanne H M de Vries,et al.  Misreporting of energy and micronutrient intake estimated by food records and 24 hour recalls, control and adjustment methods in practice , 2009, British Journal of Nutrition.

[20]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[21]  Gert R. G. Lanckriet,et al.  Objective Assessment of Physical Activity: Classifiers for Public Health. , 2016, Medicine and science in sports and exercise.

[22]  Frank B. Hu,et al.  Dietary pattern analysis: a new direction in nutritional epidemiology , 2002, Current opinion in lipidology.

[23]  Nathalie Japkowicz,et al.  Beyond the Boundaries of SMOTE - A Framework for Manifold-Based Synthetically Oversampling , 2016, ECML/PKDD.

[24]  Fatemeh Seyednasrollah,et al.  Prediction of Adulthood Obesity Using Genetic and Childhood Clinical Risk Factors in the Cardiovascular Risk in Young Finns Study , 2017, Circulation. Cardiovascular genetics.

[25]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[26]  Leonardo Lancia,et al.  Nonlinear dynamics in speech perception , 2010 .

[27]  Mohsen Mesgarani,et al.  Book Review: Diet, Nutrition and the Prevention of Chronic Diseases , 2003, World Health Organization technical report series.

[28]  Hilan Bensusan,et al.  Estimating the Predictive Accuracy of a Classifier , 2001, ECML.

[29]  Gustavo E. A. P. A. Batista,et al.  Balancing Strategies and Class Overlapping , 2005, IDA.

[30]  Mark J. Schreiber,et al.  Decision Tree Algorithms Predict the Diagnosis and Outcome of Dengue Fever in the Early Phase of Illness , 2008, PLoS neglected tropical diseases.

[31]  Viktor K. Jirsa,et al.  Nonlinear Dynamics in Human Behavior , 2013, Nonlinear Dynamics in Human Behavior.

[32]  Jean Adams,et al.  Identifying small groups of foods that can predict achievement of key dietary recommendations: data mining of the UK National Diet and Nutrition Survey, 2008–12 , 2016, Public Health Nutrition.

[33]  Victor Kipnis,et al.  Comparing 3 dietary pattern methods--cluster analysis, factor analysis, and index analysis--With colorectal cancer risk: The NIH-AARP Diet and Health Study. , 2010, American journal of epidemiology.

[34]  Boyd Swinburn,et al.  Diet, nutrition and the prevention of chronic diseases : report of a Joint WHO/FAO Expert Consultation , 2003 .

[35]  B. Neal,et al.  The Science of Salt: A Regularly Updated Systematic Review of the Implementation of Salt Reduction Interventions (June–October 2015) , 2016, Journal of clinical hypertension.

[36]  R. McLean,et al.  The Science of Salt: A Regularly Updated Systematic Review of the Implementation of Salt Reduction Interventions (November 2015 to February 2016) , 2016, Journal of clinical hypertension.

[37]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[38]  J. Lusher,et al.  People with a body mass index ⩾30 under-report their dietary intake: A systematic review , 2019, Journal of health psychology.