A Correlation-Based Approach to Attribute Selection in Chemical Graph Mining

The huge number of descriptive features is often a problem in data mining. We analyzed structure activity data for dopamine antagonists, which involves selecting useful features from numerous fragments extracted from their chemical structures. Correlation coefficients among categorical variables were used to select attributes. Chemists evaluated the rules obtained by the cascade model, and the importance of attribute selection was confirmed.

[1]  B. Margolin,et al.  An Analysis of Variance for Categorical Data , 1971 .

[2]  Takashi Okada Discovery of Structure Activity Relationships using the Cascade Model : The Mutagenicity of Aromatic Nitro Compounds , 2001 .

[3]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[4]  Takashi Okada,et al.  Datascape Survey Using the Cascade Model , 2002, Discovery Science.

[5]  G. Klopman Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic molecules , 1985 .

[6]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[7]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[8]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[9]  Takashi Okada,et al.  Rule Induction in Cascade Model Based on Sum of Squares Decomposition , 1999, PKDD.

[10]  Takashi Okada,et al.  Efficient Detection of Local Interactions in the Cascade Model , 2000, PAKDD.

[11]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[12]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[13]  Takashi Okada,et al.  A Note on Covariances for Categorical Data , 2000, IDEAL.

[14]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[15]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.