Our previous work developed SORCER, a learning system that induces a set of rules from a data set represented as a second-order decision table. Second-order decision tables are database relations in which rows have sets of atomic values as components. Using sets of values, which are interpreted as disjunctions, provides compact representations that facilitate efficient management and enhance comprehensibility. SORCER generates classifiers with a near minimum number of rows. The induction algorithm can be viewed as a table compression technique in which a table of training data is transformed into a second-order table with fewer rows by merging rows in ways that preserve consistency with the training data. In this paper we propose three new mechanisms in SORCER: (1) compression by removal of table columns, (2) inclusion of simple rules based on statistics, and (3) a method for partitioning continuous data into discrete clusters. We apply our approach to classify clinical phenotypes of a genetic collagenous disorder, Osteogenesis imperfecta, using a data set of point mutations in COLIA1 gene. Preliminary results show that on the average, over ten 10-fold cross validations, SORCER obtained an error estimate of 16.7 %, compared to 35.1 % obtained from the decision tree learner, C4.5.
[1]
Rattikorn Hewett,et al.
Knowledge Discovery with Second-Order Relations
,
2002,
Knowledge and Information Systems.
[2]
Lawrence Hunter,et al.
Finding Relevant Biomolecular Features
,
1993,
ISMB.
[3]
T. Klein,et al.
Neural networks applied to the collagenous disease Osteogenesis imperfecta
,
1992,
Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.
[4]
Gregory Piatetsky-Shapiro,et al.
Advances in Knowledge Discovery and Data Mining
,
2004,
Lecture Notes in Computer Science.
[5]
Ron Kohavi,et al.
The Power of Decision Tables
,
1995,
ECML.
[6]
Thomas G. Dietterich.
What is machine learning?
,
2020,
Archives of Disease in Childhood.
[7]
J. Ross Quinlan,et al.
C4.5: Programs for Machine Learning
,
1992
.
[8]
Pedro M. Domingos.
The Role of Occam's Razor in Knowledge Discovery
,
1999,
Data Mining and Knowledge Discovery.
[9]
P. Kollman,et al.
Computed free energy differences between point mutations in a collagen-like peptide.
,
2001,
Biopolymers.