Fast and Accurate Approaches for Large-Scale, Automated Mapping of Food Diaries on Food Composition Tables

Aim of Study: The use of weighed food diaries in nutritional studies provides a powerful method to quantify food and nutrient intakes. Yet, mapping these records onto food composition tables (FCTs) is a challenging, time-consuming and error-prone process. Experts make this effort manually and no automation has been previously proposed. Our study aimed to assess automated approaches to map food items onto FCTs. Methods: We used food diaries (~170,000 records pertaining to 4,200 unique food items) from the DiOGenes randomized clinical trial. We attempted to map these items onto six FCTs available from the EuroFIR resource. Two approaches were tested: the first was based solely on food name similarity (fuzzy matching). The second used a machine learning approach (C5.0 classifier) combining both fuzzy matching and food energy. We tested mapping food items using their original names and also an English-translation. Top matching pairs were reviewed manually to derive performance metrics: precision (the percentage of correctly mapped items) and recall (percentage of mapped items). Results: The simpler approach: fuzzy matching, provided very good performance. Under a relaxed threshold (score > 50%), this approach enabled to remap 99.49% of the items with a precision of 88.75%. With a slightly more stringent threshold (score > 63%), the precision could be significantly improved to 96.81% while keeping a recall rate > 95% (i.e., only 5% of the queried items would not be mapped). The machine learning approach did not lead to any improvements compared to the fuzzy matching. However, it could increase substantially the recall rate for food items without any clear equivalent in the FCTs (+7 and +20% when mapping items using their original or English-translated names). Our approaches have been implemented as R packages and are freely available from GitHub. Conclusion: This study is the first to provide automated approaches for large-scale food item mapping onto FCTs. We demonstrate that both high precision and recall can be achieved. Our solutions can be used with any FCT and do not require any programming background. These methodologies and findings are useful to any small or large nutritional study (observational as well as interventional).

[1]  A. Valsesia,et al.  Distinct lipid profiles predict improved glycemic control in obese, nondiabetic patients after a low-caloric diet intervention: the Diet, Obesity and Genes randomized trial. , 2016, The American journal of clinical nutrition.

[2]  Kyungwon Oh,et al.  Dietary assessment methods in epidemiologic studies , 2014, Epidemiology and health.

[3]  Keming Yuan,et al.  Sodium Content of Foods Contributing to Sodium Intake: Comparison between Selected Foods from the CDC Packaged Food Database and the USDA National Nutrient Database for Standard Reference , 2015, Procedia food science.

[4]  Paul Finglas,et al.  EuroFIR Guideline on calculation of nutrient content of foods for food business operators. , 2018, Food chemistry.

[5]  Marios Anthimopoulos,et al.  A Food Recognition System for Diabetic Patients Based on an Optimized Bag-of-Features Model , 2014, IEEE Journal of Biomedical and Health Informatics.

[6]  C Hodgkins,et al.  The importance of harmonizing food composition data across Europe , 2007, European Journal of Clinical Nutrition.

[7]  T. Larsen,et al.  The Diet, Obesity and Genes (Diogenes) Dietary Study in eight European countries – a comprehensive design for long‐term intervention , 2010, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[8]  A. Astrup,et al.  Diets with high or low protein content and glycemic index for weight-loss maintenance. , 2010, The New England journal of medicine.

[9]  Susan E. Gebhardt,et al.  Procedures for Estimating Nutrient Values for Food Composition Databases , 1997 .

[10]  Rachel Berry,et al.  Assessing and improving the quality of food composition databases for nutrition and health applications in Europe: the contribution of EuroFIR. , 2014, Advances in nutrition.

[11]  A. Valsesia,et al.  Transcriptome profiling from adipose tissue during a low-calorie diet reveals predictors of weight and glycemic outcomes in obese, nondiabetic subjects. , 2017, The American journal of clinical nutrition.

[12]  Paolo Napoletano,et al.  Food Recognition: A New Dataset, Experiments, and Results , 2017, IEEE Journal of Biomedical and Health Informatics.

[13]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14]  Fredric C. Gey,et al.  The Relationship between Recall and Precision , 1994, J. Am. Soc. Inf. Sci..

[15]  Alison L Eldridge,et al.  Food composition data: the foundation of dietetic practice and research. , 2007, Journal of the American Dietetic Association.

[16]  D. Greenwood,et al.  Development of a New Branded UK Food Composition Database for an Online Dietary Assessment Tool , 2016, Nutrients.

[17]  P. Finglas,et al.  McCance and Widdowson's The Composition of Foods Seventh Summary Edition and updated Composition of Foods Integrated Dataset , 2015 .

[18]  B. Koroušić Seljak,et al.  NutriNet: A Deep Learning Food and Drink Image Recognition System for Dietary Assessment , 2017, Nutrients.

[19]  J. Friedman Stochastic gradient boosting , 2002 .

[20]  Zhi-Hong Mao,et al.  Automatic food detection in egocentric images using artificial intelligence technology , 2018, Public Health Nutrition.

[21]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .