Scalable Model for Extensional and Intensional Descriptions of Unclassified Data

Knowledge discovery from unlabeled data comprises two main tasks: identification of "natural groups" and analysis of these groups in order to interpret their meaning. These tasks are accomplished by unsupervised and supervised learning, respectively, and correspond to the taxonomy and explanation phases of the discovery process described by Langley [9]. The efforts of Knowledge Discovery from Databases (KDD) research field has addressed these two processes into two main dimensions: (1) scaling up the learning algorithms to very large databases, and (2) improving the efficiency of the knowledge discovery process. In this paper we argue that the advances achieved in scaling up supervised and unsupervised learning algorithms allow us to combine these two processes in just one model, providing extensional (who belongs to each group) and intensional (what features best describe each group) descriptions of unlabeled data. To explore this idea we present an artificial neural network (ANN) architecture, using as building blocks two well-know models: the ART1 network, from the Adaptive Resonance Theory family of ANNs [4], and the Combinatorial Neural Model (CNM), proposed by Machado ([11] and [12])). Both models satisfy one important desiderata for data mining, learning in just one pass of the database. Moreover, CNM, the intensional part of the architecture, allows one to obtain rules directly from its structure. These rules represent the insights on the groups. The architecture can be extended to other supervised/unsupervised learning algorithms that comply with the same desiderata.

[1]  Joseph P. Bigus,et al.  Data mining with neural networks: solving business problems from application development to decision support , 1996 .

[2]  Hércules Antonio do Prado,et al.  A parsimonious generation of combinatorial neural model , 1998 .

[3]  Charles Leave Neural Networks: Algorithms, Applications and Programming Techniques , 1992 .

[4]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[5]  Paulo Martins Engel,et al.  Accuracy Tuning on Combinatorial Neural Model , 1999, PAKDD.

[6]  S. Grossberg,et al.  Neural Dynamics of Category Learning and Recognition: Attention, Memory Consolidation, and Amnesia , 1987 .

[7]  David M. Skapura,et al.  Neural networks - algorithms, applications, and programming techniques , 1991, Computation and neural systems series.

[8]  R. J. Machado Handling knowledge in high order neural networks: the combinatorial neural model , 1989, International 1989 Joint Conference on Neural Networks.

[9]  Mark W. Altom,et al.  Correlated symptoms and simulated medical classification. , 1982, Journal of experimental psychology. Learning, memory, and cognition.

[10]  Wolfgang Pree,et al.  Optimization of the combinatorial neural model , 1998, Proceedings 5th Brazilian Symposium on Neural Networks (Cat. No.98EX209).

[11]  D. Medin,et al.  The role of theories in conceptual coherence. , 1985, Psychological review.

[12]  Valmir Carneiro Barbosa,et al.  Learning in the combinatorial neural model , 1998, IEEE Trans. Neural Networks.

[13]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[14]  Joseph P. Bigus,et al.  Data mining with neural networks , 1996 .

[15]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[16]  S. Wrobel Concept Formation and Knowledge Revision , 1994, Springer US.

[17]  Bonnie A. Charpentier,et al.  Supercritical fluid extraction and chromatography : techniques and applications , 1988 .

[18]  Pat Langley,et al.  The Computer-Aided Discovery of Scientific Knowledge , 1998, Discovery Science.