Embedded clustering systems with biological data mining applications

Mining biological/biomedical data is an emerging area of intersection between biological/biomedical research and computer science. Clustering methods are among core components in many data mining studies, typically, embedded within a complex system to solve real-world problems which require a comprehensive systems approach. This thesis introduces some embedded clustering systems for selected biological problems. Aspects of clustering that are discussed include the clustering algorithm itself and its applications in a whole systems view. In chapter 2, MABAC, a new clustering algorithm is introduced along with appropriate tests. In chapter 3, SEQOPTICS, a protein sequences clustering method is presented with data sets extracted from Internet databases. Results of SEQOPTICS are compared with two other clustering methods. In chapter 4, an allosteric network of Ligand Gated Ion Channels (LGICs) is discovered by clustering on statistics coupling and correlation analysis. In chapter 5, IMAR, a data mining system for movement identification and analysis in stroke rehabilitation procedures, is introduced. In it clustering, classification and database techniques are integrated into the systems context. The algorithm and some biological data mining systems embedded with clustering are described on chapter base. Results are presented and evaluated in each chapter.