Knowledge Discovery in Database: Induction Graph and Cellular Automaton

In this article we present the general architecture of a cellular machine, which makes it possible to reduce the size of induction graphs, and to optimize automatically the generation of symbolic rules. Our objective is to propose a tool for detecting and eliminating non relevant variables from the database. The goal, after acquisition by machine learning from a set of data, is to reduce the complexity of storage, thus to decrease the computing time. The objective of this work is to experiment a cellular machine for systems of inference containing rules. Our system relies upon the graphs generated by the SIPINA method. After an introduction aiming at positioning our contribution within the area of machine learning, we briefly present the SIPINA method for automatic retrieval of knowledge starting from data. We then describe our cellular system and the phase of knowledge post-processing, in particular the validation and the use of extracted knowledge. The presentation of our system is mostly done through an example taken from medical diagnosis.

[1]  Boris S. Kerner,et al.  Cellular automata approach to three-phase traffic theory , 2002, cond-mat/0206370.

[2]  Padhraic Smyth,et al.  Statistical inference and data mining : Data mining and knowledge discovery in databases , 1996 .

[3]  Hing-Yan Lee,et al.  Visualization Support for Data Mining , 1996, IEEE Expert.

[4]  Ian Witten,et al.  Data Mining , 2000 .

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Dietrich E. Wolf,et al.  Cellular automata for traffic simulations , 1999 .

[7]  Stephen Wolfram,et al.  Cellular Automata And Complexity , 1994 .

[8]  B. Schönfisch,et al.  Synchronous and asynchronous updating in cellular automata. , 1999, Bio Systems.

[9]  I-Nong Lee,et al.  Important variable selection techniques with multiple solutions for medical information applications , 2002, Medical informatics and the Internet in medicine.

[10]  B. Chopard,et al.  Cellular automata model of car traffic in a two-dimensional street network , 1996 .

[11]  Padhraic Smyth,et al.  Statistical inference and data mining , 1996, CACM.

[12]  A. Herr,et al.  Identifaction of Bat Echolocation Calls Using a Decision Classification System. , 1997 .

[13]  Jean-Gabriel Ganascia Improvement and Refinement of the Learning Bias Semantic , 1988, ECAI.

[14]  Georgios Ch. Sirakoulis,et al.  A cellular automaton methodology for the simulation of integrated circuit fabrication processes , 2002, Future Gener. Comput. Syst..

[15]  Hongjun Lu,et al.  Effective Data Mining Using Neural Networks , 1996, IEEE Trans. Knowl. Data Eng..

[16]  L. Lebart,et al.  Statistique exploratoire multidimensionnelle , 1995 .

[17]  Sunil Vadera,et al.  RuLess: A Method for the Acquisition and Simplification of Rules , 2000, MICAI.

[18]  Keng Siau,et al.  A review of data mining techniques , 2001, Ind. Manag. Data Syst..

[19]  Louis Wehenkel,et al.  A complete fuzzy decision tree technique , 2003, Fuzzy Sets Syst..

[20]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[21]  I-Nong Lee,et al.  Appropriate medical data categorization for data mining classification techniques , 2002, Medical informatics and the Internet in medicine.

[22]  Régis Beuscart,et al.  From Data Collection to Knowledge Data Discovery: A Medical Application of Data Mining , 2001, MedInfo.

[23]  Parimal Pal Chaudhuri,et al.  Theory and Applications of Cellular Automata in Cryptography , 1994, IEEE Trans. Computers.

[24]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.