We take a regression-based approach to the problem of induction, which is the problem of inferring general rules from specific instances. Whereas traditional regression analysis fits a numerical formula to data, we fit a logical formula to boolean data. We can, for instance, construct an expert system for fitting rules to an expert's observed behavior. A regression-based approach has the advantage of providing tests of statistical significance as well as other tools of regression analysis. Our approach can be extended to nonboolean discrete data, and we argue that it is better suited to rule construction than logit and other types of categorical data analysis. We find maximum likelihood and bayesian estimates of a best-fitting boolean function or formula and show that bayesian estimates are more appropriate. We also derive confidence and significance levels. We show that finding the best-fitting logical formula is a pseudo-boolean optimization problem, and finding the best-fitting monotone function is a network flow problem.
[1]
F. Guess.
Bayesian Statistics: Principles, Models, and Applications
,
1990
.
[2]
John H. Holland,et al.
Induction: Processes of Inference, Learning, and Discovery
,
1987,
IEEE Expert.
[3]
David E. Goldberg,et al.
Genetic Algorithms in Search Optimization and Machine Learning
,
1988
.
[4]
Endre Boros,et al.
Predicting Cause-Effect Relationships from Incomplete Discrete Observations
,
1994,
SIAM J. Discret. Math..
[5]
Nils J. Nilsson,et al.
The Mathematical Foundations of Learning Machines
,
1990
.
[6]
J. Picard.
Maximal Closure of a Graph and Applications to Combinatorial Problems
,
1976
.
[7]
S. Vajda,et al.
BOOLEAN METHODS IN OPERATIONS RESEARCH AND RELATED AREAS
,
1969
.
[8]
J. E. Jackson.
The Analysis of Cross-Classified Data Having Ordered Categories
,
1986
.