The study of applying a systematic procedure based on SOFM clustering technique into organism clustering

In biological industry, the biological organisms are frequently assumed that the coding structure of nucleotide sequences is different. That is, we can obtain the useful information about the clustering of biological organisms by means of analyzing the coding structure of nucleotide sequences. The primary contribution of such analysis can be regarded as the understanding for the origin of living organisms from the viewpoint of bioinformatics. In this article, we proposed a systematic procedure to address such issue by combining the data transformation, dimension reduction and clustering technique to DNA sequence. The biologists can apply the proposed procedure to get the initial sense for the classification of biological organisms, and it will speed up their analytic action. Besides, an illustrative example will be applied to demonstrate the feasibility and rationality of the proposed procedure in this article.

[1]  R.P. Lippmann,et al.  Pattern classification using neural networks , 1989, IEEE Communications Magazine.

[2]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[3]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[4]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[5]  Hanspeter Herzel,et al.  10-11 bp periodicities in complete genomes reflect protein structure and DNA folding , 1999, Bioinform..

[6]  S Dusko Ehrlich,et al.  Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species , 2004, Genome Biology.

[7]  I. Grosse,et al.  MEASURING CORRELATIONS IN SYMBOL SEQUENCES , 1995 .

[8]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[9]  S. Duprat,et al.  Genome evolution at the genus level: comparison of three complete genomes of hyperthermophilic archaea. , 2001, Genome research.

[10]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[11]  R. Doolittle,et al.  Determining Divergence Times of the Major Kingdoms of Living Organisms with a Protein Clock , 1996, Science.

[12]  Yasuhiko Isohata,et al.  Analyses of DNA Base Sequences for Eukaryotes in Terms of Power Spectrum Method , 2005 .

[13]  I-Ching Yang,et al.  Spectral classification of archaeal and bacterial genomes , 2002 .

[14]  Shigehiko Kanaya,et al.  Periodicity in prokaryotic and eukaryotic genomes identified by power spectrum analysis. , 2002, Gene.

[15]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[16]  Anne-Brit Kolstø,et al.  Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis—One Species on the Basis of Genetic Evidence , 2000, Applied and Environmental Microbiology.

[17]  Maria de Sousa Vieira,et al.  Statistics of DNA sequences: a low-frequency analysis. , 1999, cond-mat/9905074.

[18]  Su-Long Nyeo,et al.  POWER-LAWS IN THE COMPLETE SEQUENCES OF HUMAN GENOME , 2005 .

[19]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[20]  Liaofu Luo,et al.  Periodicity of base correlation in nucleotide sequence , 1997 .