论文信息 - Inductive Logic Programming for Multiple-Part Data: Applications on Structure-Activity Relationship Studies

Inductive Logic Programming for Multiple-Part Data: Applications on Structure-Activity Relationship Studies

Inductive Logic Programming (ILP) becomes interesting when the expressive power of first-order representation provides comprehensibility to learning result and capability to handle more complex data consisting of their relations. Nevertheless, the bottleneck for learning first-order theory is enormous hypothesis search space which causes inefficient performance by the existing learning approaches compared to the propositional approaches. This paper introduces an improved ILP approach capable of handling more efficiently a kind of data called multiple-part data, i.e., one instance of data consists of several parts as well as relations among parts. This approach tries to find hypothesis describing class of each training example by using both individual and relational characteristics of its part which is similar to finding common substructures among the complex relational instances. The multiple-part data can be found in various domains especially on Structure-Activity Relationship (SAR) studies which aim to generate hypotheses describing activities or characteristics of chemical compounds from their own structures. Each compound is composed of atoms as parts, and various kinds of bond as relations among atoms. We then apply the proposed algorithm for SAR studies by conducting experiments on two real-world datasets: mutagenicity in nitroaromatic compounds and dopamine antagonist compounds. The experiment results were compared to the previous approaches in order to show the performance of proposed approach.

Masayuki Numao | Cholwich Nattee | Sukree Sinthupinyo

[1] Ashwin Srinivasan,et al. Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[2] Tomás Lozano-Pérez,et al. A Framework for Multiple-Instance Learning , 1997, NIPS.

[3] Ashwin Srinivasan,et al. Mutagenesis: ILP experiments in a non-determinate biological domain , 1994 .

[4] J. Ross Quinlan,et al. Learning logical definitions from relations , 1990, Machine Learning.

[5] Bernhard Pfahringer,et al. A Two-Level Learning Method for Generalized Multi-instance Problems , 2003, ECML.

[6] David D. Jensen,et al. Identifying Predictive Structures in Relational Data Using Multiple Instance Learning , 2003, ICML.

[7] Thomas Gärtner,et al. Multi-Instance Kernels , 2002, ICML.

[8] Yann Chevaleyre,et al. Solving Multiple-Instance and Multiple-Part Learning Problems with Decision Trees and Rule Sets. Application to the Mutagenesis Problem , 2001, Canadian Conference on AI.

[9] Thomas G. Dietterich,et al. Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[10] Jun Wang,et al. Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[11] Yann Chevaleyre,et al. A Framework for Learning Rules from Multiple Instance Data , 2001, ECML.