Data mining model for food safety incidents based on structural analysis and semantic similarity

Food safety is of vital interest for public health and the stability of society. In this paper, we analyzed the characteristics of food safety incidents (FSIs), including spatial distribution, food categories, risk factors, and supply chain links, reported by mainstream media in China. Based on our analysis, we constructed a semantic template for text data related to FSIs. Furthermore, we introduced a multi-layer, multi-level semantic structure of rank (MMSS-Rank) algorithm to measure the similarity between collected food safety data and the semantic template. We then calculated the overall scores (i.e., text layer weight, semantic template weight, and keyword density matrix) and selected an appropriate threshold to determine the accuracy of the FSI data. Results showed that, compared with traditional methods, MMSS-Rank is an efficient and robust method for identifying large-scale FSI data with higher accuracy and recall rate.

[1]  Fang Rong,et al.  Influencing factors of consumer willingness to pay for cold chain logistics: an empirical analysis in China , 2019, J. Ambient Intell. Humaniz. Comput..

[2]  Gabriella Pasi,et al.  A wikipedia-based semantic relatedness framework for effective dimensions classification in online reputation management , 2018, J. Ambient Intell. Humaniz. Comput..

[3]  Cui Zhi-ming A Deep Web Sources Focused Crawler's Crawling Strategy , 2009 .

[4]  Rada Mihalcea,et al.  PageRank on Semantic Networks, with Application to Word Sense Disambiguation , 2004, COLING.

[5]  R. Huirne,et al.  Economics of food safety in chains: a review of general principles , 2004 .

[6]  Yang Rui-long Automatic Blog recognition with DOM tree , 2008 .

[7]  Jian Yin,et al.  A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method: A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method , 2011 .

[8]  Wenju Zhang,et al.  Research on customer purchase behaviors in online take-out platforms based on semantic fuzziness and deep web crawler , 2019, Journal of Ambient Intelligence and Humanized Computing.

[9]  Giuseppe Pirrò,et al.  A semantic similarity metric combining features and intrinsic information content , 2009, Data Knowl. Eng..

[10]  Keiji Fukuda,et al.  Food safety in a globalized world , 2015, Bulletin of the World Health Organization.

[11]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[12]  Jianbo Gao,et al.  Insights into the nature of food safety issues in Beijing through content analysis of an Internet database of food safety incidents in China , 2015 .

[13]  Zhang DeBin,et al.  Model for food safety warning based on inspection data and BP neural network. , 2010 .

[14]  Ophir Frieder,et al.  Collection statistics for fast duplicate document detection , 2002, TOIS.

[15]  William A. Kerr,et al.  A review of Chinese food safety strategies implemented after several food safety incidents involving export of Chinese aquatic products , 2012 .

[16]  Takeshi Suzuki,et al.  The Immediate Influence of a Food Safety Incident on Japanese Consumers’ Food Choice Decisions and Willingness to Pay for Safer Food , 2014 .

[17]  Marek Reformat,et al.  Context-aware similarity assessment within semantic space formed in linked data , 2013, J. Ambient Intell. Humaniz. Comput..

[18]  Dongmin Kong,et al.  Investor Reactions to Food Safety Incidents: Evidence from the Chinese Milk Industry , 2013 .

[19]  John Matatko,et al.  Uncertainty in Risk Assessment, Risk Management, and Decision Making , 2012, Advances in Risk Analysis.

[20]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[21]  Sun Yat,et al.  A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method , 2011 .

[22]  Andreas Paepcke,et al.  SpotSigs: robust and efficient near duplicate detection in large web collections , 2008, SIGIR '08.

[23]  Li Qiang,et al.  Application of content analysis in food safety reports on the Internet in China , 2011 .

[24]  Barbara Burlingame,et al.  The essential balance: Risks and benefits in food safety and quality , 2007 .