A Machine Learning Approach to Recipe Text Processing

We propose a machine learning approach to recipe text processing problem aiming at converting a recipe text to a work flow. In this paper, we focus on the natural language processing (NLP) such as word identification, named entity recognition, and s yntac- tic analysis to extract predicate-argument structures (tu ples of a ver- bal expression and its arguments) from a sentence in a recipe text. Predicate-argument structures are subgraphs of the work flo w of a recipe. We solve these problems by methods based on machine learning techniques. The recipe domain is, however, different from the gen- eral domain in which many language resources are available. And we have to adapt NLP systems to the recipe texts by preparing anno- tated data in the recipe domain. To reduce the cost of the adaptation, we adopt a pointwise framework allowing to train analyzers from partially annotated data. The experimental results showed that an adaptation works well for each NLP and with all the adaptations the accuracy of the entire system increased. We can conclude that more adaptation work helps develop an accurate recipe-text-to-flow system.

[1]  Graham Neubig,et al.  Training Dependency Parsers from Partially Annotated Corpora , 2011, IJCNLP.

[2]  Manabu Sassano,et al.  An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation , 2002, ACL.

[3]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[4]  Masaki Murata,et al.  Named Entity Extraction Based on A Maximum Entropy Model and Transformation Rules , 2000, ACL.

[5]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[6]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[7]  Masaaki Nagata,et al.  A Stochastic Japanese Morphological Analyzer Using a Forward-DP Backward-A* N-Best Search Algorithm , 1994, COLING.

[8]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[9]  Kikuo Maekawa Compilation of the Balanced Corpus of Contemporary Written Japanese in the KOTONOHA Initiative (Invited Paper) , 2008, 2008 Second International Symposium on Universal Communication.

[10]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[11]  Graham Neubig,et al.  Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis , 2011, ACL.

[12]  Ichiro Ide,et al.  Structural analysis of cooking preparation steps in Japanese , 2000, IRAL '00.

[13]  Graham Neubig,et al.  Word-based Partial Annotation for Efficient Corpus Construction , 2010, LREC.

[14]  Yu Yang,et al.  Substructure similarity measurement in chinese recipes , 2008, WWW.

[15]  Makoto Nagao,et al.  Word Extraction from Corpora and Its Part-of-Speech Estimation Using Distributional Analysis , 1996, COLING.