Inspection-oriented coding service based on machine learning and semantics mining

HS codes have been adopted by the majority of countries as being the basis for import and export inspection and the generation of trade statistics. Customs authorities and international traders need a HS code query tool to make their processing efficient and automatic. Since HS codes are identified at 5–7 levels of classification, then any intelligent coding service will need to combine a knowledge database, with the techniques of data mining, machine learning and semantics reasoning. In this paper, the authors propose a comprehensive solution for such a coding service. The architecture, related techniques, technical solution and implementation considerations for the proposed system have been provided. Several of the proposed functions and implementation techniques have been developed and deployed by the Shanghai International Airport Entry-Exit Inspection and Quarantine Bureau. The coding service has been published as a Web service, and has the potential to be widely used by authorities and international traders around the world. The proposed system may also be appropriate for other applications that relate to code or classification processes, such as RFID-based or product ontology based applications.

[1]  Yixin Chen,et al.  A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[3]  De Raedt,et al.  Advances in Inductive Logic Programming , 1996 .

[4]  Eric Saund,et al.  Applying the Multiple Cause Mixture Model to Text Categorization , 1996, ICML.

[5]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[6]  Philip J. Hayes,et al.  CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories , 1990, IAAI.

[7]  Sholom M. Weiss,et al.  Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[8]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[9]  Miguel E. Ruiz,et al.  CINDOR TREC-9 English-Chinese Evaluation , 2000, TREC.

[10]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[11]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[12]  Richard T. Snodgrass,et al.  Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data : SIGMOD '94, Minneapolis, Minnesota, May 24-27, 1994 , 1994, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[13]  Gina-Anne Levow,et al.  Large-Scale Construction of a Chinese-English Semantic Hierarchy , 2000 .

[14]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[15]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[16]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[17]  Marine Carpuat,et al.  Boosting for Named Entity Recognition , 2002, CoNLL.

[18]  Ching-Chi Hsu,et al.  Generating Frequent Patterns with the Frequent Pattern List , 2001, PAKDD.

[19]  John J. Grefenstette,et al.  Learning Sequential Decision Rules Using Simulation Models and Competition , 1990, Machine Learning.

[20]  Guido Governatori,et al.  An algorithm for the induction of defeasible logic theories from databases , 2003, ADC.

[21]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[22]  Kostas Tzeras,et al.  Automatic indexing based on Bayesian inference networks , 1993, SIGIR.

[23]  Pascale Fung,et al.  Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet , 2002 .