FoodSIS: a text mining system to improve the state of food safety in singapore

Food safety is an important health issue in Singapore as the number of food poisoning cases have increased significantly over the past few decades. The National Environment Agency of Singapore (NEA) is the primary government agency responsible for monitoring and mitigating the food safety risks. In an effort to pro-actively monitor emerging food safety issues and to stay abreast with developments related to food safety in the world, NEA tracks the World Wide Web as a source of news feeds to identify food safety related articles. However, such information gathering is a difficult and time consuming process due to information overload. In this paper, we present FoodSIS, a system for end-to-end web information gathering for food safety. FoodSIS improves efficiency of such focused information gathering process with the use of machine learning techniques to identify and rank relevant content. We discuss the challenges in building such a system and describe how thoughtful system design and recent advances in machine learning provide a framework that synthesizes interactive learning with classification to provide a system that is used in daily operations. We conduct experiments and demonstrate that our classification approach results in improving the efficiency by average 35% compared to a conventional approach and the ranking approach leads to average 16% improvement in elevating the ranks of relevant articles.

[1]  Fei Wang,et al.  Semi-Supervised Classification with Universum , 2008, SDM.

[2]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[3]  Ramesh Nallapati,et al.  Discriminative models for information retrieval , 2004, SIGIR '04.

[4]  S. Sathiya Keerthi,et al.  Optimization Techniques for Semi-Supervised Support Vector Machines , 2008, J. Mach. Learn. Res..

[5]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[6]  Yi Zhang,et al.  Average Precision , 2009, Encyclopedia of Database Systems.

[7]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[8]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[9]  Vikas Sindhwani,et al.  Concept Labeling: Building Text Classifiers with Minimal Supervision , 2011, IJCAI.

[10]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[11]  Michael R. Lyu,et al.  Can irrelevant data help semi-supervised learning, why and how? , 2011, CIKM '11.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[14]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[15]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[16]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.