Data-Mining Protein Structure by Clustering, Segmentation and Evolutionary Algorithms

In this participation are discussed principles of bioinformatics, datamining, evolutionary computing and its mutual intersection. Data-mining by means of selected evolutionary techniques are discussed with attention on protein structure, segmentation and state of art on the field of evolutionary algorithms use. Basic principles and terminology of evolutionary algorithms, as well as two evolutionary algorithms are mentioned here - differential evolution and self-organizing migrating algorithm.

[1]  James E. Bray,et al.  The CATH Database provides insights into protein structure/function relationships , 1999, Nucleic Acids Res..

[2]  Ioannis Pitas,et al.  Language engineering and information theoretic methods in protein sequence similarity studies , 2008, Computational Intelligence in Medical Informatics.

[3]  Jeffrey W. Roberts,et al.  遺伝子の分子生物学 = Molecular biology of the gene , 1970 .

[4]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[5]  T. Dandekar,et al.  Improving genetic algorithms for protein folding simulations by systematic crossover. , 1999, Bio Systems.

[6]  R. Ellis,et al.  Molecular Chaperones , 1993, Springer Netherlands.

[7]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[8]  Juan Liu,et al.  Clustering Protein Interaction Data Through Chaotic Genetic Algorithm , 2006, SEAL.

[9]  J. T. Clerc Computers in chemistry , 1982 .

[10]  Jaime G. Carbonell,et al.  Comparative n-gram analysis of whole-genome protein sequences , 2002 .

[11]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[12]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[13]  E N Trifonov,et al.  [Evolutionary aspects of protein structure and folding]. , 2001, Molekuliarnaia biologiia.

[14]  K. Wüthrich Protein structure determination in solution by NMR spectroscopy. , 1990, The Journal of biological chemistry.

[15]  Vikash Kumar Dubey,et al.  Snapshots of protein folding problem: implications of folding and misfolding studies. , 2006, Protein and peptide letters.

[16]  Martin Vingron,et al.  SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein , 2002, Nucleic Acids Res..

[17]  Gordon S. Rule,et al.  Fundamentals of Protein NMR Spectroscopy , 2005 .

[18]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[19]  Rolf Apweiler,et al.  CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins , 2001, Nucleic Acids Res..

[20]  Keh-Yih Su,et al.  Automatic Construction of a Chinese Electronic Dictionary , 1995, VLC@ACL.

[21]  A. Schug,et al.  An evolutionary strategy for all-atom folding of the 60-amino-acid bacterial ribosomal protein l20. , 2006, Biophysical journal.

[22]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[23]  Edward Keedwell,et al.  Intelligent Bioinformatics: The Application of Artificial Intelligence Techniques to Bioinformatics Problems , 2005 .

[24]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[25]  Giorgio Valle,et al.  Combining rapid word searches with segment-to-segment alignmentfor sensitive similarity detection, domain identification andstructural modelling , 2004 .

[26]  Carlos Cotta,et al.  Protein Structure Prediction Using Evolutionary Algorithms Hybridized with Backtracking , 2009, IWANN.

[27]  G.B. Singh,et al.  Functional proteomics with biolinguistic methods , 2005, IEEE Engineering in Medicine and Biology Magazine.

[28]  Chuen-Tsai Sun,et al.  A Hybrid Genetic Algorithm Approach for Protein Secondary Structures , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[29]  Zbigniew Michalewicz,et al.  Handbook of Evolutionary Computation , 1997 .

[30]  S. Dreyfus,et al.  Thermodynamical Approach to the Traveling Salesman Problem : An Efficient Simulation Algorithm , 2004 .

[31]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[32]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[33]  Andrew M. Tyrrell,et al.  Regulatory Motif Discovery Using a Population Clustering Evolutionary Algorithm , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  Alexander McPherson Introduction to macromolecular crystallography , 2002 .

[35]  M G Rossmann,et al.  Comparison of super-secondary structures in proteins. , 1973, Journal of molecular biology.

[36]  V. Uversky Intrinsically Disordered Proteins , 2000 .

[37]  James E. Bray,et al.  The CATH database: an extended protein family resource for structural and functional genomics , 2003, Nucleic Acids Res..

[38]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[39]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[40]  Karl Benson,et al.  Evolutionary computation method for pattern recognition of cis-acting sites. , 2003, Bio Systems.

[41]  Lillian Lee,et al.  Mostly-Unsupervised Statistical Segmentation of Japanese: Applications to Kanji , 2000, ANLP.

[42]  Nathan Linial,et al.  ProtoMap: automatic classification of protein sequences and hierarchy of protein families , 2000, Nucleic Acids Res..

[43]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[44]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[45]  William E. Hart,et al.  Protein structure prediction with evolutionary algorithms , 1999 .

[46]  Frances M. G. Pearl,et al.  Recognizing the fold of a protein structure , 2003, Bioinform..

[47]  Nigel Chaffey,et al.  Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K. and Walter, P. Molecular biology of the cell. 4th edn. , 2003 .

[48]  Jean-François Gibrat,et al.  Choosing the optimal hidden Markov model for secondary-structure prediction , 2005, IEEE Intelligent Systems.

[49]  Ori Sasson,et al.  ProtoNet: hierarchical classification of the protein space , 2003, Nucleic Acids Res..

[50]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[51]  Ming Li,et al.  Applications of algorithmic information theory , 2007, Scholarpedia.

[52]  H. Müller,et al.  Statistical methods for DNA sequence segmentation , 1998 .

[53]  Gary A. Churchill,et al.  Hidden Markov Chains and the Analysis of Genome Structure , 1992, Comput. Chem..

[54]  K. V. Price,et al.  Differential evolution: a fast and simple numerical optimizer , 1996, Proceedings of North American Fuzzy Information Processing.

[55]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[56]  Leszek Rychlewski,et al.  The challenge of protein structure determination—lessons from structural genomics , 2007, Protein science : a publication of the Protein Society.

[57]  Godfrey C. Onwubolu,et al.  New optimization techniques in engineering , 2004, Studies in Fuzziness and Soft Computing.

[58]  Hans-Georg Beyer,et al.  Theory of evolution strategies - a tutorial , 2001 .

[59]  Alexander Schliep,et al.  ProClust: improved clustering of protein sequences with an extended graph-based approach , 2002, ECCB.

[60]  E. Trifonov,et al.  Segmented structure of protein sequences and early evolution of genome by combinatorial fusion of DNA elements , 1995, Journal of Molecular Evolution.

[61]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[62]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[63]  Thomas Stützle,et al.  Ant Colony Optimization Theory , 2004 .

[64]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[65]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[66]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[67]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[68]  Frank Thomson Leighton,et al.  Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete , 1998, RECOMB '98.

[69]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.