A general framework for learning rules from data

With the aim of getting understandable symbolic rules to explain a given phenomenon, we split the task of learning these rules from sensory data in two phases: a multilayer perceptron maps features into propositional variables and a set of subsequent layers operated by a PAC-like algorithm learns Boolean expressions on these variables. The special features of this procedure are that: i) the neural network is trained to produce a Boolean output having the principal task of discriminating between classes of inputs; ii) the symbolic part is directed to compute rules within a family that is not known a priori; iii) the welding point between the two learning systems is represented by a feedback based on a suitability evaluation of the computed rules. The procedure we propose is based on a computational learning paradigm set up recently in some papers in the fields of theoretical computer science, artificial intelligence and cognitive systems. The present article focuses on information management aspects of the procedure. We deal with the lack of prior information about the rules through learning strategies that affect both the meaning of the variables and the description length of the rules into which they combine. The paper uses the task of learning to formally discriminate among several emotional states as both a working example and a test bench for a comparison with previous symbolic and subsymbolic methods in the field.

[1]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[2]  Stephen Chak Tornay Ockham: studies and selections, , 1938 .

[3]  Joseph R. Shoenfield,et al.  Mathematical logic , 1967 .

[4]  A. Cohen An Introduction to Probability Theory and Mathematical Statistics , 1979 .

[5]  Temple F. Smith Occam's razor , 1980, Nature.

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[7]  K. Nelson,et al.  Event knowledge : structure and function in development , 1986 .

[8]  Ingo Wegener,et al.  The complexity of Boolean functions , 1987 .

[9]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[10]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[11]  P. Smolensky On the proper treatment of connectionism , 1988, Behavioral and Brain Sciences.

[12]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[13]  Stephen I. Gallant,et al.  Connectionist expert systems , 1988, CACM.

[14]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[15]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[16]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[17]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[18]  Emile H. L. Aarts,et al.  Simulated annealing and Boltzmann machines - a stochastic approach to combinatorial optimization and neural computing , 1990, Wiley-Interscience series in discrete mathematics and optimization.

[19]  Jude Shavlik,et al.  Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.

[20]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[23]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[24]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[25]  K. Scherer,et al.  Vocal expression and communication of emotion. , 1993 .

[26]  G. Dorffner,et al.  Connectionism, Symbol Grounding, and Autonomous Agents , 1993 .

[27]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[28]  Li-Min Fu Knowledge-based connectionism for revising domain theories , 1993, IEEE Trans. Syst. Man Cybern..

[29]  R. Plutchik The psychology and biology of emotion , 1994 .

[30]  Earl Cox,et al.  The fuzzy systems handbook , 1994 .

[31]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[32]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[33]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[34]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[35]  Geoffrey I. Webb Further Experimental Evidence against the Utility of Occam's Razor , 1996, J. Artif. Intell. Res..

[36]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[37]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[38]  Lars Kai Hansen,et al.  Universal Distribution of Saliencies for Pruning in Layered Neural Networks , 1997, Int. J. Neural Syst..

[39]  Bruno Apolloni,et al.  PAC Learning of Concept Classes Through the Boundaries of Their Items , 1997, Theor. Comput. Sci..

[40]  Jim Austin,et al.  A Cellular Neural Associative Array for Symbolic Vision , 1998, Hybrid Neural Systems.

[41]  Stefanos D. Kollias,et al.  Principled hybrid systems: theory and applications (Physta) , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[42]  J. Bachorowski Vocal Expression and Perception of Emotion , 1999 .

[43]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[44]  EMOTIONS: WHAT IS POSSIBLE IN THE ASR FRAMEWORK , 2000 .

[45]  Anna Esposito,et al.  A new text-independent method for phoneme segmentation , 2001, Proceedings of the 44th IEEE 2001 Midwest Symposium on Circuits and Systems. MWSCAS 2001 (Cat. No.01CH37257).

[46]  Bruno Apolloni,et al.  Gaining degrees of freedom in subsymbolic learning , 2001, Theor. Comput. Sci..

[47]  Bruno Apolloni,et al.  From Synapses to Rules: Discovering Symbolic Rules from Neural Processed Data , 2002 .

[48]  Bruno Apolloni,et al.  From synapses to rules , 2002, Cognitive Systems Research.

[49]  Shlomo Geva,et al.  Rule extraction from local cluster neural nets , 2002, Neurocomputing.

[50]  Bruno Apolloni,et al.  PAC Meditation on Boolean Formulas , 2002, SARA.

[51]  Hak-Keung Lam,et al.  Tuning of the structure and parameters of a neural network using an improved genetic algorithm , 2003, IEEE Trans. Neural Networks.

[52]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[53]  Bruno Apolloni,et al.  Algorithmic Inference in Machine Learning , 2005, IEEE Transactions on Neural Networks.