What You Use, Not What You Do: Automatic Classification of Recipes

Social media data is notoriously noisy and unclean. Recipe collections built by users are no exception, particularly when it comes to cataloging them. However, consistent and transparent categorization is vital to users who search for a specific entry. Similarly, curators are faced with the same challenge given a large collection of existing recipes: They first need to understand the data to be able to build a clean system of categories. This paper presents an empirical study on the automatic classification of recipes on the German cooking website Chefkoch. The central question we aim at answering is: Which information is necessary to perform well at this task? In particular, we compare features extracted from the free text instructions of the recipe to those taken from the list of ingredients. On a sample of 5,000 recipes with 87 classes, our feature analysis shows that a combination of nouns from the textual description of the recipe with ingredient features performs best (48% \(\text {F}_1\)). Nouns alone achieve 45% \(\text {F}_1\) and ingredients alone 46% \(\text {F}_1\). However, other word classes do not complement the information from nouns. On a bigger training set of 50,000 instances, the best configuration shows an improvement to 57% highlighting the importance of a sizeable data set.

[1]  D. Cox The Regression Analysis of Binary Sequences , 2017 .

[2]  Hiroshi Murase,et al.  Finding replaceable materials in cooking recipe texts considering characteristic cooking actions , 2009, CEA '09.

[3]  Yoko Yamakata,et al.  Feature Extraction and Summarization of Recipes Using Flow Graph , 2013, SocInfo.

[4]  Dietrich Klakow,et al.  Web-Based Relation Extraction for the Food Domain , 2012, NLDB.

[5]  Kazutoshi Sumiya,et al.  Construction of a cooking ontology from cooking recipes and patents , 2014, UbiComp Adjunct.

[6]  Liping Wang,et al.  A Personalized Recipe Database System with User- Centered Adaptation and Tutoring Support , 2007 .

[7]  Yamakata Yoko,et al.  Flow Graph Corpus from Recipe Texts , 2013 .

[8]  Hwan-Gue Cho,et al.  Constructing Cookery Network based on Ingredient Entropy Measure , 2015 .

[9]  Ricardo Ribeiro,et al.  Cooking an Ontology , 2006, AIMSA.

[10]  Emmanuel Nauer,et al.  Extracting Generic Cooking Adaptation Knowledge for the TAAABLE Case-Based Reasoning System , 2012 .

[11]  Young-joo Chung Finding food entity relationships using user-generated data in recipe service , 2012, CIKM '12.

[12]  J. Ramakrishna Naik,et al.  Cuisine Classification and Recipe Generation , 2015 .

[13]  Erik Jonsson,et al.  Semantic word classification and temporaldependency detection on cooking recipes , 2015 .

[14]  Belen Diaz Agudo,et al.  ACook : Recipe adaptation using ontologies , case-based reasoning systems and knowledge discovery , 2012 .

[15]  Rishikesh Sanjay Ghewari,et al.  Predicting Cuisine from Ingredients , 2015 .

[16]  Yu Yang,et al.  Substructure similarity measurement in chinese recipes , 2008, WWW.

[17]  Yoko Yamakata,et al.  A Machine Learning Approach to Recipe Text Processing , 2012 .

[18]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[19]  Haoran Xie,et al.  A Hybrid Semantic Item Model for Recipe Search by Example , 2010, 2010 IEEE International Symposium on Multimedia.

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  Lada A. Adamic,et al.  Recipe recommendation using ingredient networks , 2011, WebSci '12.

[22]  Iris Hendrickx,et al.  Very quaffable and great fun: Applying NLP to wine reviews , 2016, ACL.

[23]  Dietrich Klakow,et al.  Data-driven knowledge extraction for the food domain , 2012, KONVENS.

[24]  Dietrich Klakow,et al.  Relation Extraction for the Food Domain without Labeled Training Data - Is Distant Supervision the Best Solution? , 2014, PolTAL.

[25]  Pieter Abbeel,et al.  Max-margin Classification of Data with Absent Features , 2008, J. Mach. Learn. Res..

[26]  Yejin Choi,et al.  Mise en Place: Unsupervised Interpretation of Instructional Recipes , 2015, EMNLP.

[27]  Hala Skaf-Molli,et al.  WIKITAAABLE: A semantic wiki as a blackboard for a textual case-base reasoning system , 2009, SemWiki.

[28]  Shinsuke Mori,et al.  A framework for recipe text interpretation , 2014, UbiComp Adjunct.