Bioinformatics Data Analysis Using an Artificial Immune Network

This work describes a new proposal for gene expression data clustering based on a combination of an immune network, named aiNet, and the minimal spanning tree (MST). The aiNet is an AIS inspired by the immune network theory. Its main role is to perform data compression and to identify portions of the input space representative of a given data set. The output of aiNet is a set of antibodies that represent the data set in a simplified way. The MST is then built on this network, and clusters are determined by using a new method for detecting the inconsistent edges of the tree. An important advantage of this technique over the classical approaches, like hierarchical clustering, is that there is no need of previous knowledge about the number of clusters and their distributions. The hybrid algorithm was first applied to a benchmark data set to demonstrate its validity, and its results were compared with those produced by other approaches from the literature. Using the full yeast S. cerevisiae gene expression data set, it was possible to detect a strong interconnection of the genes, hindering the perception of inconsistencies that may lead to the separation of data into clusters.

[1]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[2]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[3]  F. Burnet The clonal selection theory of acquired immunity , 1959 .

[4]  Y Xu,et al.  Minimum spanning trees for gene expression data clustering. , 2001, Genome informatics. International Conference on Genome Informatics.

[5]  Fernando José Von Zuben,et al.  An Evolutionary Immune Network for Data Clustering , 2000, SBRN.

[6]  C. Müller,et al.  Large-scale clustering of cDNA-fingerprinting data. , 1999, Genome research.

[7]  Leandro Nunes de Castro,et al.  The Clonal Selection Algorithm with Engineering Applications 1 , 2000 .

[8]  R. Prim Shortest connection networks and some generalizations , 1957 .

[9]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[11]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1993 .

[12]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  N M Luscombe,et al.  What is Bioinformatics? A Proposed Definition and Overview of the Field , 2001, Methods of Information in Medicine.

[14]  Jerne Nk Towards a network theory of the immune system. , 1974 .

[15]  Pablo Moscato,et al.  Ordering Gene Expression Data Using One-dimensional Self-Organizing Maps , 2002, WOB.

[16]  Simon Parsons,et al.  Bioinformatics: The Machine Learning Approach by P. Baldi and S. Brunak, 2nd edn, MIT Press, 452 pp., $60.00, ISBN 0-262-02506-X , 2004, The Knowledge Engineering Review.

[17]  Ka Yee Yeung,et al.  Cluster analysis of gene expression data , 2001 .

[18]  Leandro Nunes de Castro,et al.  aiNet: An Artificial Immune Network for Data Analysis , 2002 .

[19]  Brian Everitt,et al.  Cluster analysis , 1974 .

[20]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[21]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[22]  Hussein A. Abbass,et al.  Data Mining: A Heuristic Approach , 2002 .

[23]  Mark A. Best,et al.  Bioinformatics: the Machine Learning Approach, 2nd edn , 2004 .