Fuzzy C-means for inferring functional coupling of proteins from their phylogenetic profiles

This paper explores the use of the fuzzy C-means clustering algorithm for inferring functional relationships among proteins from their phylogenetic profiles. We conducted a series of experiments using phylogenetic profiles provided by the Cluster of Orthologous Groups of Proteins (COG) database. Experimental results demonstrate that the increasing expressiveness of fuzzy C-means enables the discovery of functional relationships on E.coli operons which traditional methods fail to detect

[1]  J. Kamholz,et al.  Nucleotide sequence analysis of the purEK operon encoding 5'-phosphoribosyl-5-aminoimidazole carboxylase of Escherichia coli K-12 , 1989, Journal of bacteriology.

[2]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[3]  George J. Klir,et al.  Fuzzy sets and fuzzy logic , 1995 .

[4]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[7]  A. Sali,et al.  Genomics: Functional links between proteins , 1999, Nature.

[9]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[10]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[11]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  D. Eisenberg,et al.  Localizing proteins in the cell from their phylogenetic profiles. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Jean-Philippe Vert,et al.  A tree kernel to analyse phylogenetic profiles , 2002, ISMB.

[15]  Zhen Liu,et al.  Refined phylogenetic profiles method for predicting protein-protein interactions , 2005, Bioinform..

[16]  Simon Kasif,et al.  Identification of functional links between genes using phylogenetic profiles , 2003, Bioinform..

[17]  Yan Shi,et al.  Study on combining subtractive clustering with fuzzy c-means clustering , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[18]  G. Duester,et al.  Nucleotide sequence of an Escherichia coli tRNA (Leu 1) operon and identification of the transcription promoter signal. , 1981, Nucleic acids research.

[19]  Yan P. Yuan,et al.  Predicting function: from genes to genomes and back. , 1998, Journal of molecular biology.

[20]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[21]  Jonathan D Wren The emerging in-silico scientist: how text-based bioinformatics is bridging biology and artificial intelligence. , 2004, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[22]  Juan J. Nieto,et al.  Fuzzy Logic in Medicine and Bioinformatics , 2006, Journal of biomedicine & biotechnology.

[23]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[25]  H. Garner Engineering in genomics: the emerging in-silico scientist; how text-based bioinformatics is bridging biology and artificial intelligence , 2004, IEEE Engineering in Medicine and Biology Magazine.

[26]  Jean-Philippe Vert A tree kernel to analyze phylog enetic profi les , 2002 .

[27]  D. Eisenberg,et al.  Protein function in the post-genomic era , 2000, Nature.

[28]  Stephen L. Chiu,et al.  Fuzzy Model Identification Based on Cluster Estimation , 1994, J. Intell. Fuzzy Syst..

[29]  E. Marcotte,et al.  Computational genetics: finding protein function by nonhomology methods. , 2000, Current opinion in structural biology.

[30]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[31]  Siegfried Gottwald,et al.  Fuzzy Sets and Fuzzy Logic , 1993 .