MOTIVATION
A model for learning potential causes of toxicity from positive and negative examples and predicting toxicity for the dataset used in the Predictive Toxicology Challenge (PTC) is presented. The learning model assumes that the causes of toxicity can be given as substructures common to positive examples that are not substructures of negative examples. This assumption results in the choice of a learning model, called the JSM-method, and a language for representing chemical compounds, called the Fragmentary Code of Substructure Superposition (FCSS). By means of the latter, chemical compounds are represented as sets of substructures which are 'biologically meaningful' from the expert point of view.
RESULTS
The chosen learning model and representation language show comparatively good performance for the PTC dataset: for three sex/species groups the predictions were ROC optimal, for one group the prediction was nearly optimal. The predictions tend to be conservative (few predictions and almost no errors), which can be explained by the specific features of the learning model.
AVAILABILITY
by request to finn@viniti.ru; serge@viniti.ru, http://ki-www2.intellektik.informatik.tu-darmstadt.de/~jsm/QDA.
[1]
Bernhard Ganter,et al.
Formal Concept Analysis: Mathematical Foundations
,
1998
.
[2]
Sergei O. Kuznetsov,et al.
Algorithms for the Construction of Concept Lattices and Their Diagram Graphs
,
2001,
PKDD.
[3]
Thomas G. Dietterich.
What is machine learning?
,
2020,
Archives of Disease in Childhood.
[4]
Nicholas Rescher,et al.
Plausible reasoning
,
1976
.
[5]
V. E. Golender,et al.
Structure-activity relationship oriented languages for chemical structure representation
,
1982,
J. Chem. Inf. Comput. Sci..
[6]
Bernhard Ganter,et al.
Formalizing Hypotheses with Concepts
,
2000,
ICCS.
[7]
L. Beran,et al.
[Formal concept analysis].
,
1996,
Casopis lekaru ceskych.