Computational intelligence methods for rule-based data understanding

In many applications, black-box prediction is not satisfactory, and understanding the data is of critical importance. Typically, approaches useful for understanding of data involve logical rules, evaluate similarity to prototypes, or are based on visualization or graphical methods. This paper is focused on the extraction and use of logical rules for data understanding. All aspects of rule generation, optimization, and application are described, including the problem of finding good symbolic descriptors for continuous data, tradeoffs between accuracy and simplicity at the rule-extraction stage, and tradeoffs between rejection and error level at the rule optimization stage. Stability of rule-based description, calculation of probabilities from rules, and other related issues are also discussed. Major approaches to extraction of logical rules based on neural networks, decision trees, machine learning, and statistical methods are introduced. Optimization and application issues for sets of logical rules are described. Applications of such methods to benchmark and real-life problems are reported and illustrated with simple logical rules for many datasets. Challenges and new directions for research are outlined.

[1]  Thomas P. Caudell,et al.  Acquiring rule sets as a product of learning in a logical neural architecture , 1997, IEEE Trans. Neural Networks.

[2]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[3]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[4]  Raymond J. Mooney,et al.  Combining Neural and Symbolic Learning to Revise Probabilistic Rule Bases , 1992, NIPS.

[5]  Dimitar Filev,et al.  Relational partitioning of fuzzy rules , 1996, Fuzzy Sets Syst..

[6]  Ludmila I. Kuncheva,et al.  How good are fuzzy If-Then classifiers? , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[7]  Gregory F. Cooper,et al.  A latent variable model for multivariate discretization , 1999, AISTATS.

[8]  Tharam S. Dillon,et al.  Automated knowledge acquisition , 1994, Prentice Hall International series in computer science and engineering.

[9]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[10]  Volker Tresp,et al.  Network Structuring and Training Using Rule-Based Knowledge , 1992, NIPS.

[11]  Norbert Jankowski,et al.  Initialization of adaptive parameters in density networks , 2000 .

[12]  Norbert Jankowski,et al.  New developments in the Feature Space Mapping model , 2000 .

[13]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[14]  Li-Min Fu Knowledge-based connectionism for revising domain theories , 1993, IEEE Trans. Syst. Man Cybern..

[15]  Wlodzislaw Duch,et al.  Hybrid Neural-global Minimization Method of Logical Rule Extraction , 1999, J. Adv. Comput. Intell. Intell. Informatics.

[16]  Jude W. Shavlik,et al.  Extracting Refined Rules from Knowledge-Based Neural Networks , 1993, Machine Learning.

[17]  Włodzisław Duch,et al.  NEURAL METHODS FOR ANALYSIS OF PSYCHOMETRIC DATA , 2000 .

[18]  Kenneth A. Kaufman,et al.  Data Mining and Knowledge Discovery: A Review of Issues and a Multistrategy Approach , 1997 .

[19]  Włodzisław Duch,et al.  Neural optimization of linguistic variables and membership functions , 1999, ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378).

[20]  LiMin Fu,et al.  Neural networks in computer intelligence , 1994 .

[21]  David G. Stork,et al.  Pattern Classification , 1973 .

[22]  Rudolf Kruse,et al.  Generating classification rules with the neuro-fuzzy system NEFCLASS , 1996, Proceedings of North American Fuzzy Information Processing.

[23]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[24]  Yoichi Hayashi,et al.  Fuzzy neural expert system with automated extraction of fuzzy If-Then rules from a trained neural network , 1990, [1990] Proceedings. First International Symposium on Uncertainty Modeling and Analysis.

[25]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[26]  Włodzisław Duch,et al.  Extraction of logical rules from training data using backpropagation networks , 2000 .

[27]  S. Geva,et al.  Refining Expert Knowledge with an Artificial Neural Network , 1997, ICONIP.

[28]  Ian T. Jolliffe 10. Exploratory and Multivariate Data Analysis , 1993 .

[29]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[30]  Rudy Setiono,et al.  Extracting -of- Rules from Trained Neural Networks , 2000 .

[31]  James E. Andrews,et al.  Combinatorial rule explosion eliminated by a fuzzy rule configuration , 1998, IEEE Trans. Fuzzy Syst..

[32]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[33]  Simon Kasif,et al.  Induction of Oblique Decision Trees , 1993, IJCAI.

[34]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[35]  Wlodzislaw Duch,et al.  Heterogeneous Forests of Decision Trees , 2002, ICANN.

[36]  Peter Géczy,et al.  Rule Extraction from Trained Artificial Neural Networks , 1997, ICONIP.

[37]  Yoichi Hayashi,et al.  A Neural Expert System with Automated Extraction of Fuzzy If-Then Rules , 1990, NIPS.

[38]  R. Nakano,et al.  Medical diagnostic expert system based on PDP model , 1988, IEEE 1988 International Conference on Neural Networks.

[39]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[40]  Alfred Ultsch,et al.  Knowledge Extraction from Self-Organizing Neural Networks , 1993 .

[41]  Jacques Teghem,et al.  Some Experiments to Compare Rough Sets Theory and Ordinal Statistical Methods , 1992, Intelligent Decision Support.

[42]  Robert J. Marks,et al.  Inversion of feedforward neural networks: algorithms and applications , 1999, Proc. IEEE.

[43]  Mercedes Fernández-Redondo,et al.  Inversion of a Neural Network via Interval Arithmetic for Rule Extraction , 2003, ICANN.

[44]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[45]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[46]  Sankar K. Pal,et al.  Knowledge-based fuzzy MLP for classification and rule generation , 1997, IEEE Trans. Neural Networks.

[47]  Rudy Setiono,et al.  Extracting Rules from Neural Networks by Pruning and Hidden-Unit Splitting , 1997, Neural Computation.

[48]  K. Grudzinski,et al.  Prototype based rules-a new way to understand the data , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[49]  Krzysztof Grabczewski,et al.  Extraction of logical rules from backpropagation networks , 1998 .

[50]  Ivo Düntsch,et al.  IRIS Revisited: A Comparison of Discriminant and Enhanced Rough Set Data Analysis , 1998 .

[51]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[52]  Tim Oates,et al.  Large Datasets Lead to Overly Complex Models: An Explanation and a Solution , 1998, KDD.

[53]  Nikola Kasabov,et al.  Rule Extraction from Linguistic Rule Networks and from Fuzzy Neural Networks : Propositional versus Fuzzy Rules , 1998 .

[54]  Katsumi Yoshida,et al.  A comparison between two neural network rule extraction techniques for the diagnosis of hepatobiliary disorders , 2000, Artif. Intell. Medicine.

[55]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[56]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[57]  Jacek M. Zurada,et al.  Fuzzy neural network with relational fuzzy rules , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[58]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[59]  Rudy Setiono,et al.  Generating concise and accurate classification rules for breast cancer diagnosis , 2000, Artif. Intell. Medicine.

[60]  Vipin Kumar,et al.  Search in Artificial Intelligence , 1988, Symbolic Computation.

[61]  Jude W. Shavlik,et al.  Using Sampling and Queries to Extract Rules from Trained Neural Networks , 1994, ICML.

[62]  Norbert Jankowski,et al.  Survey of Neural Transfer Functions , 1999 .

[63]  Yoichi Hayashi,et al.  Fuzzy and Crisp Logical Rule Extraction Methods in Application to Medical Data , 2000 .

[64]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[65]  L. Breiman Bias-variance, regularization, instability and stabilization , 1998 .

[66]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[67]  Detlef Nauck,et al.  Foundations Of Neuro-Fuzzy Systems , 1997 .

[68]  LiMin Fu,et al.  Rule Generation from Neural Networks , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[69]  Ryszard S. Michalski,et al.  On the Quasi-Minimal Solution of the General Covering Problem , 1969 .

[70]  Jacek M. Zurada,et al.  Data-driven linguistic modeling using relational fuzzy rules , 2003, IEEE Trans. Fuzzy Syst..

[71]  Wlodzislaw Duch,et al.  Optimization and Interpretation of Rule-based Classifiers , 2000, Intelligent Information Systems.

[72]  Jude W. Shavlik,et al.  Extracting refined rules from knowledge-based neural networks , 2004, Machine Learning.

[73]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[74]  Włodzisław Duch,et al.  Multilayer Perceptron Trained with Numerical Gradient , 2003 .

[75]  Dingli Yu,et al.  Selecting radial basis function network centers with recursive orthogonal least squares training , 2000, IEEE Trans. Neural Networks Learn. Syst..

[76]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[77]  Kevin M. Passino,et al.  Avoiding exponential parameter growth in fuzzy systems , 2001, IEEE Trans. Fuzzy Syst..

[78]  Joachim Diederich,et al.  The truth will come to light: directions and challenges in extracting the knowledge embedded within trained artificial neural networks , 1998, IEEE Trans. Neural Networks.

[79]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[80]  Wlodzislaw Duch,et al.  Extraction of Logical Rules from Neural Networks , 1998, Neural Processing Letters.

[81]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[82]  Frederick Zarndt,et al.  A Comprehensive Case Study: An Examination of Machine Learning and Connectionist Algorithms , 1995 .

[83]  Shigeo Abe,et al.  Neural Networks and Fuzzy Systems , 1996, Springer US.

[84]  Saman K. Halgamuge,et al.  Neural networks in designing fuzzy systems for real world applications , 1994 .

[85]  LiMin Fu,et al.  Rule Learning by Searching on Adapted Nets , 1991, AAAI.

[86]  Wlodzislaw Duch,et al.  Extraction of crisp logical rules using constrained backpropagation networks , 1997, ESANN.

[87]  Tarun Khanna,et al.  Foundations of neural networks , 1990 .

[88]  Virendrakumar C. Bhavsar,et al.  Can a vector space based learning model discover inductive class generalization in a symbolic environment? , 1995, Pattern Recognit. Lett..

[89]  P. Arabshahi Steepest Descent Adaptation of Min-Max Fuzzy If-Then Rules 1 , 1992 .

[90]  Wolfram Schiffmann,et al.  Comparison of optimized backpropagation algorithms , 1993, ESANN.

[91]  Ivan Bratko,et al.  ASSISTANT 86: A Knowledge-Elicitation Tool for Sophisticated Users , 1987, EWSL.

[92]  I. Turksen,et al.  Measurement of Membership Functions: Theoretical and Empirical Work , 2000 .

[93]  Włodzisław Duch,et al.  Search-based Training for Logical Rule Extraction by Multilayer Perceptron , 2003 .

[94]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[95]  Michael C. Mozer,et al.  Template-Based Algorithms for Connectionist Rule Extraction , 1994, NIPS.

[96]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[97]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[98]  Jacek M. Zurada,et al.  Introduction to artificial neural systems , 1992 .

[99]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[100]  Hongjun Lu,et al.  Effective Data Mining Using Neural Networks , 1996, IEEE Trans. Knowl. Data Eng..

[101]  Wlodzislaw Duch,et al.  Optimization of logical rules derived by neural procedures , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[102]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[103]  Huan Liu,et al.  Dimensionality reduction via discretization , 1996, Knowl. Based Syst..

[104]  Ron Kohavi,et al.  Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[105]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[106]  Tom Fawcett,et al.  Using rule sets to maximize ROC performance , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[107]  Michael C. Mozer,et al.  Rule Induction through Integrated Symbolic and Subsymbolic Processing , 1991, NIPS.

[108]  O. Mangasarian,et al.  Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[109]  Włodzisław Duch,et al.  Floating Gaussian Mapping: a New Model of Adaptive Systems , 2000 .

[110]  Nikola Kasabov,et al.  Foundations Of Neural Networks, Fuzzy Systems, And Knowledge Engineering [Books in Brief] , 1996, IEEE Transactions on Neural Networks.

[111]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[112]  Vicki Bruce,et al.  Perception And Representation , 1995 .

[113]  Masumi Ishikawa,et al.  Rule extraction by successive regularization , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[114]  Wlodzislaw Duch Coloring black boxes: visualization of neural network decisions , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[115]  Alan Tickle,et al.  Rule-Extraction from trained neural networks : Di erent techniques for the determination ofherbicides for the plant protection advisorysystem PRO , 1996 .

[116]  Wlodzislaw Duch,et al.  Feature space mapping as a universal adaptive system , 1995 .

[117]  Andrzej Skowron,et al.  Rough-Fuzzy Hybridization: A New Trend in Decision Making , 1999 .

[118]  Huan Liu,et al.  Symbolic Representation of Neural Networks , 1996, Computer.

[119]  Martin Stacey,et al.  Scientific Discovery: Computational Explorations of the Creative Processes , 1988 .

[120]  Visakan Kadirkamanathan,et al.  Statistical Control of RBF-like Networks for Classification , 1997, ICANN.

[121]  Hao Xing,et al.  Extract intelligible and concise fuzzy rules from neural networks , 2002, Fuzzy Sets Syst..

[122]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[123]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[124]  John W. Lloyd,et al.  Classification of Individuals with Complex Structure , 2000, ICML.

[125]  J. Butcher,et al.  Personality: individual differences and clinical assessment. , 1996, Annual review of psychology.

[126]  Ishwar K. Sethi,et al.  Symbolic approximation of feedforward neural networks , 1994 .

[127]  Huan Liu,et al.  Understanding Neural Networks via Rule Extraction , 1995, IJCAI.

[128]  Hartmut Surmann,et al.  Learning feed-forward and recurrent fuzzy systems: A genetic approach , 2001, J. Syst. Archit..

[129]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[130]  Rudy Setiono Extracting M-of-N rules from trained neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[131]  Włodzisław Duch,et al.  Searching for optimal MLP , 1999 .

[132]  K. Gr,et al.  A general purpose separability criterion for classification systems , 1999 .

[133]  Wlodzislaw Duch,et al.  A new methodology of extraction, optimization and application of crisp and fuzzy logical rules , 2001, IEEE Trans. Neural Networks.

[134]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[135]  Abraham Kandel,et al.  Neuro-Fuzzy Pattern Recognition , 2000 .

[136]  Larry J. Eshelman,et al.  Using Weighted Networks to Represent Classification Knowledge in Noisy Domains , 1988, ML.

[137]  Chuen-Tsai Sun,et al.  Functional equivalence between radial basis function networks and fuzzy inference systems , 1993, IEEE Trans. Neural Networks.

[138]  Sebastian Thrun,et al.  Extracting Rules from Artifical Neural Networks with Distributed Representations , 1994, NIPS.

[139]  Wee Kheng Leow,et al.  FERNN: An Algorithm for Fast Extraction of Rules from Neural Networks , 2004, Applied Intelligence.

[140]  Stephen I. Gallant,et al.  Neural network learning and expert systems , 1993 .

[141]  Simon Kasif,et al.  OC1: A Randomized Induction of Oblique Decision Trees , 1993, AAAI.

[142]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[143]  Huan Liu,et al.  A connectionist approach to generating oblique decision trees , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[144]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..