Comparing Semantic and Nutrient Value Similarities of Recipes

Although food and nutrition have been studied for centuries, modern nutritional science is surprisingly young. Human knowledge about food and nutrition has evolved drastically with time, and it has especially expanded the last few decades, with information and data being mass produced and available everywhere and in any form it is really easy to get overwhelmed and confused when it comes to what is right and what is wrong. Macronutrient assessment is a crucial task for individuals suffering from various diseases, and also very relevant for professional athletes, and nowadays it is becoming part of everyday life for many people, because of health or fitness reasons. Assessing the nutritional components of food is very challenging and requires reliant source of data. In this paper, we introduce an idea of finding similar recipes with regard to their macronutrient values based on learned recipe vector representation. On a scientifically proven dataset of recipe data containing description and macronutrient values we introduce word and paragraph embeddings, learn concept representations for the textual descriptions, proceed with calculating similarity between the embeddings, and then compare with the similarity between nutrient values. The results show a strong correlation between these two similarities. This study is a promising beginning for continuing in this direction-introducing multi-label classification or regression for predicting nutrient content of food.

[1]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[2]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[3]  Peter D. Turney Distributional Semantics Beyond Words: Supervised Learning of Analogy and Paraphrase , 2013, TACL.

[4]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[5]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[6]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[7]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  L. Valsta Food consumption data , 2019 .

[13]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[14]  Johannes Keizer,et al.  Thesaurus maintenance, alignment and publication as linked data: the AGROVOC use case , 2012, Int. J. Metadata Semant. Ontologies.

[15]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[17]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[18]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[19]  Petr Sojka,et al.  Gensim -- Statistical Semantics in Python , 2011 .

[20]  Tome Eftimov,et al.  FoodBase corpus: a new resource of annotated food entities , 2019, Database J. Biol. Databases Curation.