Pattern Discovery by Residual Analysis and Recursive Partitioning

In this paper, a novel method of pattern discovery is proposed. It is based on the theoretical formulation of a contingency table of events. Using residual analysis and recursive partitioning, statistically significant events are identified in a data set. These events constitute the important information contained in the data set and are easily interpretable as simple rules, contour plots, or parallel axes plots. In addition, an informative probabilistic description of the data is automatically furnished by the discovery process. Following a theoretical formulation, experiments with real and simulated data will demonstrate the ability to discover subtle patterns amid noise, the invariance to changes of scale, cluster detection, and discovery of multidimensional patterns. It is shown that the pattern discovery method offers the advantages of easy interpretation, rapid training, and tolerance to noncentralized noise.

[1]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[2]  Andrew K. C. Wong,et al.  Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis , 1991, Knowledge Discovery in Databases.

[3]  J. Simonoff Multivariate Density Estimation , 1996 .

[4]  S. Port Theoretical Probability for Applications , 1993 .

[5]  E. Wegman Hyperdimensional Data Analysis Using Parallel Coordinates , 1990 .

[6]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[7]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[8]  John A. Sonquist,et al.  Multivariate model building;: The validation of a search strategy , 1970 .

[9]  D. Coomans,et al.  Comparison of Multivariate Discrimination Techniques for Clinical Data— Application to the Thyroid Functional State , 1983, Methods of Information in Medicine.

[10]  C. Cox,et al.  An Elementary Introduction to Maximum Likelihood Estimation for Multinomial Models: Birch's Theorem and the Delta Method , 1984 .

[11]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[12]  S. Haberman Analysis of qualitative data , 1978 .

[13]  R. Olshen,et al.  Asymptotically Efficient Solutions to the Classification Problem , 1978 .

[14]  Stéphane Avner Extraction of comprehensive symbolic rules from a multi-layer perceptron , 1996 .

[15]  S. C. Darby,et al.  Public Program Analysis. A New Categorical Data Approach. , 1982 .

[16]  D. W. Scott,et al.  Plasma lipids as collateral risk factors in coronary artery disease--a study of 371 males with chest pain. , 1978, Journal of chronic diseases.

[17]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[18]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[19]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[20]  R. Fletcher Practical Methods of Optimization , 1988 .

[21]  Robert G. Lehnen,et al.  Public Program Analysis: A New Categorical Data Approach , 1981 .

[22]  Masaki Yamamoto,et al.  Reorganizing knowledge in neural networks: an explanatory mechanism for neural networks in data classification problems , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[23]  P. Halfpenny The Analysis of Qualitative Data , 1979 .

[24]  Gerald Tesauro,et al.  Visualizing processes in neural networks , 1991, IBM J. Res. Dev..

[25]  小郷 直言,et al.  John Sonquist;Multivate Model Building:The Validation of A Search Strategy,Ann Arbor,1970 , 1975 .

[26]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[27]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[28]  Andrew K. C. Wong,et al.  Information synthesis based on hierarchical maximum entropy discretization , 1990, J. Exp. Theor. Artif. Intell..

[29]  James Joseph Biundo,et al.  Analysis of Contingency Tables , 1969 .

[30]  King-Sun Fu,et al.  A Nonparametric Partitioning Procedure for Pattern Classification , 1969, IEEE Transactions on Computers.

[31]  Bruno O. Shubert,et al.  Random variables and stochastic processes , 1979 .

[32]  S. Haberman,et al.  The analysis of frequency data , 1974 .

[33]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[34]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[35]  Jooyoung Park,et al.  Approximation and Radial-Basis-Function Networks , 1993, Neural Computation.

[36]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[37]  Mark Dolson Discriminative Nonlinear Dimensionality Reduction for Improved Classification , 1994, Int. J. Neural Syst..

[38]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[39]  Antonio Ciampi,et al.  Recursive Partition: A Versatile Method for Exploratory-Data Analysis in Biostatistics , 1987 .

[40]  William S. Meisel,et al.  A Partitioning Algorithm with Application in Pattern Classification and the Optimization of Decision Trees , 1973, IEEE Transactions on Computers.

[41]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[42]  Andrew K. C. Wong,et al.  Synthesizing Knowledge: A Cluster Analysis Approach Using Event Covering , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[43]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[44]  M Miguel Francisco De Lascurain On maximum entropy discretization and its applications in pattern recognition , 1983 .

[45]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[46]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[47]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[48]  WangYang,et al.  High-Order Pattern Discovery from Discrete-Valued Data , 1997 .

[49]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.