Cooking Recipe Analysis based on Sequences of Distributed Representation on Procedure Texts and Associated Images

Nowadays, online community sites on cooking and eating activities are recognized as an indispensable infrastructure for daily life.In order to accurately respond to increasingly sophisticated user requirements, it is necessary to extract the characteristics of each cooking recipe and to clarify the relationship among them. Since mapping each recipe into a vector space by using representation learning is one of the most promising ways for the cooking recipe analyses, a wide variety of distributed representations of recipes have been proposed.In this paper, to provide a precise representation of cooking recipes from a different perspective, we propose to represent each recipe using two sequences of distributed representation.One sequence is obtained from cooking steps written in recipe text using BERT, and the other one is derived from sequences of associated images during cooking by the VGG16 convolutional neural network.To assess the effectiveness of the proposal, we perform cluster analysis for recipes on four dishes based on the standard DTW distance among sequential distributed representations.

[1]  Yamamoto Shuhei,et al.  Finding Method of Replaceable Ingredients using Large Mounts of Cooking Recipes , 2014 .

[2]  Tomonobu Ozaki,et al.  Learning Distributed Representation of Recipe Flow Graphs via Frequent Subgraphs , 2019, CEA@ICMR.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[5]  Cordelia Schmid,et al.  VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Donald J. Berndt,et al.  Finding Patterns in Time Series: A Dynamic Programming Approach , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  Keiji Yanai,et al.  Simultaneous estimation of food categories and calories with multi-task CNN , 2017, 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA).

[8]  Antonio Torralba,et al.  Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jason Weston,et al.  StarSpace: Embed All The Things! , 2017, AAAI.

[11]  Yoko Yamakata,et al.  Categorization of Cooking Actions Based on Textual/Visual Similarity , 2019, MADiMa @ ACM Multimedia.

[12]  Sirisha Velampalli,et al.  Frequent SubGraph Mining Algorithms: Framework, Classification, Analysis, Comparisons , 2018 .

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Yoko Yamakata,et al.  FlowGraph2Text: Automatic Sentence Skeleton Compilation for Procedural Text Generation , 2014, INLG.

[15]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Jun Harashima,et al.  Cookpad Image Dataset: An Image Collection as Infrastructure for Food Research , 2017, SIGIR.

[18]  Yoko Yamakata,et al.  Flow Graph Corpus from Recipe Texts , 2014, LREC.

[19]  Lav R. Varshney,et al.  A Neural Network System for Transformation of Regional Cuisine Style , 2017, Front. ICT.

[20]  Vladimir Pavlovic,et al.  Deep Cooking: Predicting Relative Food Ingredient Amounts from Images , 2019, MADiMa @ ACM Multimedia.