Using Word Embeddings to Learn a Better Food Ontology

Food ontologies require significant effort to create and maintain as they involve manual and time-consuming tasks, often with limited alignment to the underlying food science knowledge. We propose a semi-supervised framework for the automated ontology population from an existing ontology scaffold by using word embeddings. Having applied this on the domain of food and subsequent evaluation against an expert-curated ontology, FoodOn, we observe that the food word embeddings capture the latent relationships and characteristics of foods. The resulting ontology, which utilizes word embeddings trained from the Wikipedia corpus, has an improvement of 89.7% in precision when compared to the expert-curated ontology FoodOn (0.34 vs. 0.18, respectively, p value = 2.6 × 10–138), and it has a 43.6% shorter path distance (hops) between predicted and actual food instances (2.91 vs. 5.16, respectively, p value = 4.7 × 10–84) when compared to other methods. This work demonstrates how high-dimensional representations of food can be used to populate ontologies and paves the way for learning ontologies that integrate contextual information from a variety of sources and types.

[1]  J. Finley,et al.  A Partnership for Public Health: Branded Food Products Database , 2015 .

[2]  Mounira Harzallah,et al.  A Typology Of Ontology-Based Semantic Measures , 2005, EMOI-INTEROP.

[3]  Lucas Drumond,et al.  A Survey of Ontology Learning Procedures , 2008, WONTO.

[4]  Mounira Harzallah,et al.  A Tree-Based Similarity for Evaluating Concept Proximities in an Ontology , 2006, Data Science and Classification.

[5]  Geoffrey E. Hinton,et al.  Visualizing non-metric similarities in multiple maps , 2011, Machine Learning.

[6]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[7]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[8]  Eating patterns and food systems: critical knowledge requirements for policy design and implementation , 2012, Agriculture & Food Security.

[9]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  C. N. Hewitt,et al.  Current global food production is sufficient to meet human nutritional needs in 2050 provided there is radical societal adaptation , 2018 .

[13]  F. Dabbenea,et al.  Food traceability systems: Performance evaluation and optimization , 2011 .

[14]  Damion M. Dooley,et al.  FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration , 2018, npj Science of Food.

[15]  L. Jackson,et al.  Special Issue Article: Advancing Environmental Conservation: Essays In Honor Of Navjot Sodhi Global food security, biodiversity conservation and the future of agricultural intensification , 2012 .

[16]  J. Finnigan,et al.  Losses, inefficiencies and waste in the global food system , 2017, Agricultural systems.

[17]  Fabrizio Dabbene,et al.  Original paper: Food traceability systems: Performance evaluation and optimization , 2011 .

[18]  Danielle G. Lemay,et al.  Building the bridges to bioinformatics in nutrition research. , 2007, The American journal of clinical nutrition.

[19]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[20]  Jordan M. Malof,et al.  Distributed solar photovoltaic array location and extent dataset for remote sensing object identification , 2016, Scientific Data.

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  T. Moe,et al.  Perspectives on traceability in food manufacture , 1998 .

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[25]  Michael Gamon,et al.  Representing Text for Joint Embedding of Text and Knowledge Bases , 2015, EMNLP.

[26]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[27]  Wolfram Wöß,et al.  Towards a Definition of Knowledge Graphs , 2016, SEMANTiCS.

[28]  Aldo Gangemi,et al.  Ontology evaluation and validation An integrated formal model for the quality diagnostic task , 2005 .

[29]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[30]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[31]  Michael R. Genesereth,et al.  Logical foundations of artificial intelligence , 1987 .

[32]  C. Maria Keet,et al.  A Formal Theory of Granularity , 2008 .

[33]  Heba Elbeh,et al.  Ontology Learning Based on Word Embeddings for Text Big Data Extraction , 2018, 2018 14th International Computer Engineering Conference (ICENCO).

[34]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[35]  Deborah L. McGuinness,et al.  FoodKG: A Semantics-Driven Knowledge Graph for Food Recommendation , 2019, SEMWEB.

[36]  S. Niwattanakul,et al.  Using of Jaccard Coefficient for Keywords Similarity , 2022 .

[37]  Keet Sugathadasa,et al.  Semi-supervised instance population of an ontology using word vector embedding , 2017, 2017 Seventeenth International Conference on Advances in ICT for Emerging Regions (ICTer).

[38]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[39]  Albert-László Barabási,et al.  The unmapped chemical complexity of our diet , 2019, Nature Food.

[40]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[41]  Broderick Crawford,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2007 .

[42]  B. Koroušić Seljak,et al.  ISO-FOOD ontology: A formal representation of the knowledge within the domain of isotopes for food science. , 2019, Food chemistry.