Inductive logic programming for data mining in economics

This paper addresses the problem of data mining in Inductive Logic Programming (ILP) motivated by its application in the domain of economics. ILP systems have been largely applied to data mining classification tasks with a considerable success. The use of ILP systems in regression tasks has been far less successful. Current systems have very limited numerical reasoning capabilities, which limits the application of ILP to discovery of functional relationships of numeric nature. This paper proposes improvements in numerical reasoning capabilities of ILP systems for dealing with regression tasks. It proposes the use of statistical-based techniques like Model Validation and Model Selection to improve noise handling and it introduces a new search stopping criterium inspired in the PAC learning framework. We have found these extensions essential to improve on results over machine learning and statistical-based algorithms used in the empirical evaluation study.

[1]  Ashwin Srinivasan,et al.  Compression, Significance, and Accuracy , 1992, ML.

[2]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[3]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[4]  James Cussens Part-of-Speech Tagging Using Progol , 1997, ILP.

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  Dick van Dijk,et al.  Forecasting industrial production with linear, nonlinear, and structural change models , 2003 .

[7]  Stephen Muggleton,et al.  ILP for Mathematical Discovery , 2003, ILP.

[8]  Sholom M. Weiss,et al.  Rule-based Machine Learning Methods for Functional Prediction , 1995, J. Artif. Intell. Res..

[9]  G. C. Tiao,et al.  Journal of the American Statistical Association Forecasting the U.s. Unemployment Rate Forecasting the U.s. Unemployment Rate , 2022 .

[10]  Rui Carlos Camacho de Sousa Ferreira da Silva,et al.  Inducing models of human control skills using machine learning algorithms , 2000 .

[11]  Bernhard Sendhoff,et al.  Structure optimization of density estimation models applied to regression problems with dynamic noise , 1999, AISTATS.

[12]  Aram Karalic,et al.  Employing Linear Regression in Regression Tree Leaves , 1992, ECAI.

[13]  Filip Zelezný,et al.  Learning Functions from Imperfect Positive Data , 2001, ILP.

[14]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[15]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[16]  Ashwin Srinivasan,et al.  Numerical Reasoning with an ILP System Capable of Lazy Evaluation and Customised Search , 1999, J. Log. Program..

[17]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[18]  M J Sternberg,et al.  Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[19]  H. Tong Non-linear time series. A dynamical system approach , 1990 .

[20]  Saso Dzeroski,et al.  Declarative Bias in Equation Discovery , 1997, ICML.

[21]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[22]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[23]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[24]  Bongseog Jang,et al.  Threshold autoregressive models for VBR MPEG video traces , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[25]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[26]  Stephen Muggleton,et al.  Learning from Positive Data , 1996, Inductive Logic Programming Workshop.