Dunn's index for cluster tendency assessment of pharmacological data sets.

Cluster tendency assessment is an important stage in cluster analysis. In this sense, a group of promising techniques named visual assessment of tendency (VAT) has emerged in the literature. The presence of clusters can be detected easily through the direct observation of a dark blocks structure along the main diagonal of the intensity image. Alternatively, if the Dunn's index for a single linkage partition is greater than 1, then it is a good indication of the blocklike structure. In this report, the Dunn's index is applied as a novel measure of tendency on 8 pharmacological data sets, represented by machine-learning-selected molecular descriptors. In all cases, observed values are less than 1, thus indicating a weak tendency for data to form compact clusters. Other results suggest that there is an increasing relationship between the Dunn's index as a measure of cluster separability and the classification accuracy of various cluster algorithms tested on the same data sets.

[1]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[2]  R. Mojena,et al.  Hierarchical Grouping Methods and Stopping Rules: An Evaluation , 1977, Comput. J..

[3]  J. Sutherland,et al.  A comparison of methods for modeling quantitative structure-activity relationships. , 2004, Journal of medicinal chemistry.

[4]  José Salvador Sánchez,et al.  An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets , 2007, CIARP.

[5]  R.J. Hathaway,et al.  Revised Visual Assessment of (Cluster) Tendency (reVAT) , 2004, IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS '04..

[6]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[7]  Gerhard Klebe,et al.  Comparison of Automatic Three-Dimensional Model Builders Using 639 X-ray Structures , 1994, J. Chem. Inf. Comput. Sci..

[8]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[9]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[10]  Louis Hodes Limits of classification. 2. Comment on Lawson and Jurs , 1992, J. Chem. Inf. Comput. Sci..

[11]  Gisbert Schneider,et al.  Status of HTS Data Mining Approaches , 2004 .

[12]  Desire L. Massart,et al.  Improved algorithm for clustering tendency , 2000 .

[13]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[14]  J. Bezdek,et al.  VAT: a tool for visual assessment of (cluster) tendency , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[15]  Hans-Hermann Bock,et al.  Data Science, Classification and Related Methods , 1998 .

[16]  James M. Keller,et al.  Is VAT really single linkage in disguise? , 2009, Annals of Mathematics and Artificial Intelligence.

[17]  Tuve Löfström,et al.  Evaluating Ensembles on QSAR Classification , 2009 .

[18]  James C. Bezdek,et al.  Scalable visual assessment of cluster tendency for large data sets , 2006, Pattern Recognit..

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  Brian Everitt,et al.  Graphical Techniques for Multivariate Data. , 1978 .

[21]  Ricardo del Corazón Grau-Ábalo,et al.  Comparison of Combinatorial Clustering Methods on Pharmacological Data Sets Represented by Machine Learning-Selected Real Molecular Descriptors , 2011, J. Chem. Inf. Model..

[22]  Richard J. Hathaway,et al.  Tendency curves for visual clustering assessment , 2008 .

[23]  Peter Willett Clustering tendency in chemical classifications , 1985, J. Chem. Inf. Comput. Sci..

[24]  Peter C. Jurs,et al.  New index for clustering tendency and its application to chemical problems , 1990, J. Chem. Inf. Comput. Sci..

[25]  Jonathan D. Hirst,et al.  Contemporary QSAR Classifiers Compared , 2007, J. Chem. Inf. Model..

[26]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[27]  Christos A. Nicolaou,et al.  Ties in Proximity and Clustering Compounds , 2001, J. Chem. Inf. Comput. Sci..

[28]  James M. Keller,et al.  Dunn’s cluster validity index as a contrast measure of VAT images , 2008, 2008 19th International Conference on Pattern Recognition.

[29]  Mark A. Johnson A review and examination of the mathematical spaces underlying molecular similarity analysis , 1989 .

[30]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[31]  Ulf Johansson,et al.  Generating Comprehensible QSAR Models , 2009 .

[32]  Benno Stein,et al.  On Cluster Validity and the Information Need of Users , 2003 .

[33]  Cor J. Veenman,et al.  A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Krzysztof Rzadca,et al.  Incrementally Assessing Cluster Tendencies with a Maximum Variance Cluster Algorithm , 2003, IbPRIA.

[35]  M. Forina,et al.  New index for clustering tendency , 2001 .

[36]  渡辺 慧,et al.  Knowing and guessing : a quantitative study of inference and information , 1969 .

[37]  Jacalyn M. Huband,et al.  bigVAT: Visual assessment of cluster tendency for large data sets , 2005, Pattern Recognit..

[38]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[39]  Wilfried N. Gansterer,et al.  On the Relationship Between Feature Selection and Classification Accuracy , 2008, FSDM.