论文信息 - Searching for hypothetical proteins: Theory and practice based upon original data and literature

Searching for hypothetical proteins: Theory and practice based upon original data and literature

A large part of mammalian proteomes is represented by hypothetical proteins (HP), i.e. proteins predicted from nucleic acid sequences only and protein sequences with unknown function. Databases are far from being complete and errors are expected. The legion of HP is awaiting experiments to show their existence at the protein level and subsequent bioinformatic handling in order to assign proteins a tentative function is mandatory. Two-dimensional gel-electrophoresis with subsequent mass spectrometrical identification of protein spots is an appropriate tool to search for HP in the high-throughput mode. Spots are identified by MS or by MS/MS measurements (MALDI-TOF, MALDI-TOF-TOF) and subsequent software as e.g. Mascot or ProFound. In many cases proteins can thus be unambiguously identified and characterised; if this is not the case, de novo sequencing or Q-TOF analysis is warranted. If the protein is not identified, the sequence is being sent to databases for BLAST searches to determine identities/similarities or homologies to known proteins. If no significant identity to known structures is observed, the protein sequence is examined for the presence of functional domains (databases PROSITE, PRINTS, InterPro, ProDom, Pfam and SMART), subjected to searches for motifs (ELM) and finally protein-protein interaction databases (InterWeaver, STRING) are consulted or predictions from conformations are performed. We here provide information about hypothetical proteins in terms of protein chemical analysis, independent of antibody availability and specificity and bioinformatic handling to contribute to the extension/completion of protein databases and include original work on HP in the brain to illustrate the processes of HP identification and functional assignment.

[1] K. Chou,et al. Prediction of the tertiary structure of a caspase‐9/inhibitor complex , 2000, FEBS letters.

[2] Michael Y. Galperin,et al. 'Conserved hypothetical' proteins: prioritization of targets for experimental study. , 2004, Nucleic acids research.

[3] B. Honig,et al. Structural genomics: Computational methods for structure analysis , 2003, Protein science : a publication of the Protein Society.

[4] Peer Bork,et al. Systematic identification of novel protein domain families associated with nuclear functions. , 2002, Genome research.

[5] A Ikai,et al. Thermostability and aliphatic index of globular proteins. , 1980, Journal of biochemistry.

[6] P. Bork,et al. Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. , 2001, Analytical chemistry.

[7] See-Kiong Ng,et al. InterWeaver: interaction reports for discovering potential protein interaction partners with online evidence , 2004, Nucleic Acids Res..

[8] G. Lubec,et al. Derangement of Hypothetical Proteins in Fetal Down's Syndrome Brain , 2004, Neurochemical Research.

[9] G. Heijne. Analysis of the distribution of charged residues in the N‐terminal region of signal sequences: implications for protein export in prokaryotic and eukaryotic cells. , 1984, The EMBO journal.

[10] R D Appel,et al. Improving protein identification from peptide mass fingerprinting through a parameterized multi‐level scoring algorithm and an optimized peak detection , 1999, Electrophoresis.

[11] G. Mahairas,et al. The Genome Sequence of Mycoplasma hyopneumoniae Strain 232, the Agent of Swine Mycoplasmosis , 2004, Journal of bacteriology.

[12] I. Vetter,et al. The crystal structure of rna1p: a new fold for a GTPase-activating protein. , 1999, Molecular cell.

[13] Tim J. P. Hubbard,et al. SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[14] M. Bhasin,et al. Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search* , 2005, Journal of Biological Chemistry.

[15] G. Casari,et al. Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force. , 1990, Journal of molecular biology.

[16] D. Baker,et al. Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[17] A. Panchenko,et al. Threading with explicit models for evolutionary conservation of structure and sequence , 1999, Proteins.

[18] K Nishikawa,et al. Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[19] Jens Meiler,et al. Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[20] Kuo-Chen Chou,et al. A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. , 2003, Biochemical and biophysical research communications.

[21] A. Lesk. Hydrophobicity--getting into hot water. , 2003, Biophysical chemistry.

[22] G. Heijne. Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. , 1992, Journal of molecular biology.

[23] Kuo-Chen Chou,et al. Prediction of Membrane Protein Types by Incorporating Amphipathic Effects , 2005, J. Chem. Inf. Model..

[24] Dominik Gront,et al. A simple lattice model that exhibits a protein-like cooperative all-or-none folding transition. , 2003, Biopolymers.

[25] István Simon,et al. The HMMTOP transmembrane topology prediction server , 2001, Bioinform..

[26] Chris Sander,et al. What's in a genome? , 1992, Nature.

[27] C. Chothia,et al. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[28] R. Doolittle,et al. A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[29] A. Krogh,et al. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[30] S H Bryant,et al. A measure of success in fold recognition. , 1997, Trends in biochemical sciences.

[31] R Sánchez,et al. Advances in comparative protein-structure modelling. , 1997, Current opinion in structural biology.

[32] U. Bastolla,et al. Looking at structure, stability, and evolution of proteins through the principal eigenvector of contact matrices and hydrophobicity profiles. , 2004, Gene.

[33] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[34] Peter R. Baker,et al. Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[35] Kuo-Chen Chou,et al. Modeling the tertiary structure of human cathepsin-E. , 2005, Biochemical and biophysical research communications.

[36] J. Skolnick,et al. Ab initio folding of proteins using restraints derived from evolutionary information , 1999, Proteins.

[37] G. Lubec,et al. Proteomic profiling of human hippocampus , 2004, Electrophoresis.

[38] S F Altschul,et al. Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. , 1998, Trends in biochemical sciences.

[39] Patrice Koehl,et al. ASTRAL compendium enhancements , 2002, Nucleic Acids Res..

[40] A. Bairoch,et al. PROSITE: recent developments. , 1994, Nucleic acids research.

[41] Rolf Apweiler,et al. Representation of functional information in the SWISS-PROT Data Bank , 1999, Bioinform..

[42] K D Watenpaugh,et al. A model of the complex between cyclin-dependent kinase 5 and the activation domain of neuronal Cdk5 activator. , 1999, Biochemical and biophysical research communications.

[43] K Nishikawa,et al. The amino acid composition is different between the cytoplasmic and extracellular sides in membrane proteins , 1992, FEBS letters.

[44] C. Chothia. One thousand families for the molecular biologist , 1992, Nature.

[45] J. Papadimitriou,et al. Establishment of a Human Medulloblastoma Cell Line and Its Heterotransplantation into Nude Mice , 1985, Journal of neuropathology and experimental neurology.

[46] Aidan Budd,et al. BLAST2SRS, a web server for flexible retrieval of related protein sequences in the SWISS-PROT and SPTrEMBL databases , 2003, Nucleic Acids Res..

[47] M. Wang,et al. Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. , 2004, Protein engineering, design & selection : PEDS.

[48] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[49] H. Meyer,et al. Identification of platelet proteins separated by two‐dimensional gel electrophoresis and analyzed by matrix assisted laser desorption/ionization‐time of flight‐mass spectrometry and detection of tyrosine‐phosphorylated proteins , 2000, Electrophoresis.

[50] Lin He,et al. Application of Pseudo Amino Acid Composition for Predicting Protein Subcellular Location: Stochastic Signal Processing Approach , 2003, Journal of protein chemistry.

[51] K. Stühler,et al. Evaluation of algorithms for protein identification from sequence databases using mass spectrometry data , 2004, Proteomics.

[52] Sandor Vajda,et al. Consensus alignment server for reliable comparative modeling with distant templates , 2004, Nucleic Acids Res..

[53] Kuo-Chen Chou,et al. Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS , 2003, Biochemical and Biophysical Research Communications.

[54] Ioannis Xenarios,et al. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[55] Michael Y. Galperin,et al. The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[56] S. Brunak,et al. SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[57] F. Collart,et al. Efficient recognition of protein fold at low sequence identity by conservative application of Psi‐BLAST: application , 2005, Journal of molecular recognition : JMR.

[58] T. Hubbard,et al. Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[59] B Rost,et al. Pitfalls of protein sequence analysis. , 1996, Current opinion in biotechnology.

[60] Zhirong Sun,et al. Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[61] Stephen H. Bryant,et al. CD-Search: protein domain annotations on the fly , 2004, Nucleic Acids Res..

[62] C. Chothia,et al. Understanding protein structure: using scop for fold interpretation. , 1996, Methods in enzymology.

[63] T. N. Bhat,et al. The Protein Data Bank: unifying the archive , 2002, Nucleic Acids Res..

[64] K.-C. Chou,et al. Using string kernel to predict signal peptide cleavage site based on subsite coupling model , 2005, Amino Acids.

[65] K. Chou,et al. Protein subcellular location prediction. , 1999, Protein engineering.

[66] Kuo-Chen Chou,et al. Molecular modeling and chemical modification for finding peptide inhibitor against severe acute respiratory syndrome coronavirus main proteinase , 2004, Analytical Biochemistry.

[67] Manuel C. Peitsch,et al. SWISS-MODEL: an automated protein homology-modeling server , 2003, Nucleic Acids Res..

[68] Terri K. Attwood,et al. The PRINTS Database: A Resource for Identification of Protein Families , 2002, Briefings Bioinform..

[69] Gerhard Wagner,et al. Solution Structure of the RAIDD CARD and Model for CARD/CARD Interaction in Caspase-2 and Caspase-9 Recruitment , 1998, Cell.

[70] Thomas L. Madden,et al. Protein sequence similarity searches using patterns as seeds. , 1998, Nucleic acids research.

[71] Altered expression of hypothetical proteins in hippocampus of transgenic mice overexpressing human Cu/Zn-superoxide dismutase 1 , 2004, Proteome Science.

[72] Ian M. Donaldson,et al. The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[73] D. Osguthorpe. Ab initio protein folding. , 2000, Current opinion in structural biology.

[74] G. Lubec,et al. Hypothetical proteins with putative enzyme activity in human amnion, lymphocyte, bronchial epithelial and kidney cell lines. , 2004, Biochimica et biophysica acta.

[75] Zukang Feng,et al. The Protein Data Bank and structural genomics , 2003, Nucleic Acids Res..

[76] M. M. Bradford. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. , 1976, Analytical biochemistry.

[77] Cathy H. Wu,et al. InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[78] Charles DeLisi,et al. Predictome: a database of putative functional links between proteins , 2002, Nucleic Acids Res..

[79] Kuo-Chen Chou,et al. Modelling extracellular domains of GABA-A receptors: subtypes 1, 2, 3, and 5. , 2004, Biochemical and biophysical research communications.

[80] Jochen Franzen,et al. A novel MALDI LIFT-TOF/TOF mass spectrometer for proteomics , 2003, Analytical and bioanalytical chemistry.

[81] K. Chou,et al. Prediction of the tertiary structure and substrate binding site of caspase‐8 , 1997, FEBS letters.

[82] B. Chait,et al. A statistical basis for testing the significance of mass spectrometric protein identification results. , 2000, Analytical chemistry.

[83] Kuo-Chen Chou,et al. Using GO-PseAA predictor to identify membrane proteins and their types. , 2005, Biochemical and biophysical research communications.

[84] M. Ahram,et al. Large-scale proteomic analysis of membrane proteins , 2004, Expert review of proteomics.

[85] D. Lipman,et al. Rapid and sensitive protein similarity searches. , 1985, Science.

[86] Daniel Fischer,et al. Twenty thousand ORFan microbial protein families for the biologist? , 2003, Structure.

[87] Srinivasan Ramachandran,et al. The low complexity proteins from enteric pathogenic bacteria: Taxonomic parallels embedded in diversity , 2003, Silico Biol..

[88] S. Teichmann,et al. Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination , 2004, Journal of Structural and Functional Genomics.

[89] R. Jernigan,et al. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[90] Kuo-Chen Chou,et al. Prediction of the Tertiary Structure of the β-Secretase Zymogen☆ , 2002 .

[91] S. Brunak,et al. Prediction of N-terminal protein sorting signals. , 1997, Current opinion in structural biology.

[92] T. Hunkapiller,et al. Peptide mass maps: a highly informative approach to protein identification. , 1993, Analytical biochemistry.

[93] G. Lubec,et al. Evidence for the existence of hypothetical proteins in human bronchial epithelial, fibroblast, amnion, lymphocyte, mesothelial and kidney cell lines , 2004, Amino Acids.

[94] Russell F. Doolittle,et al. “Homology” in proteins and nucleic acids: A terminology muddle and a way out of it , 1987, Cell.

[95] Carole A. Goble,et al. Ontology-based Knowledge Representation for Bioinformatics , 2000, Briefings Bioinform..

[96] Chris Sander,et al. The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[97] K. Chou,et al. Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[98] C. Watanabe,et al. Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[99] K. Chou. Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[100] Russ B Altman,et al. Defining bioinformatics and structural bioinformatics. , 2003, Methods of biochemical analysis.

[101] N. Grishin,et al. Practical lessons from protein structure prediction , 2005, Nucleic acids research.

[102] Leszek Rychlewski,et al. ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[103] Kuo-Chen Chou,et al. Prediction of protein signal sequences. , 2002, Current protein & peptide science.

[104] K. Chou. Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[105] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[106] K. Chou. Prediction of signal peptides using scaled window , 2001, Peptides.

[107] Christian von Mering,et al. STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[108] A. Lesk,et al. The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[109] Christian von Mering,et al. STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[110] K. Nishikawa,et al. Classification of proteins into groups based on amino acid composition and other characters. II. Grouping into four types. , 1983, Journal of biochemistry.

[111] S. Brunak,et al. Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[112] P. Aloy,et al. Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[113] Michael Y. Galperin. Conserved ‘Hypothetical’ Proteins: New Hints and New Puzzles , 2001, Comparative and functional genomics.

[114] E. Shakhnovich,et al. Understanding hierarchical protein evolution from first principles. , 2001, Journal of molecular biology.

[115] G. Gonnet,et al. Protein identification by mass profile fingerprinting. , 1993, Biochemical and biophysical research communications.

[116] G. Lubec,et al. Reduction of actin-related protein complex 2/3 in fetal Down syndrome brain. , 2002, Biochemical and biophysical research communications.

[117] James E. Bray,et al. The CATH Database provides insights into protein structure/function relationships , 1999, Nucleic Acids Res..

[118] K Nishikawa,et al. The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[119] W. Fitch. Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[120] John P. Overington,et al. HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[121] Gert Lubec,et al. Proteomics in brain research: potentials and limitations , 2003, Progress in Neurobiology.

[122] D. Baker,et al. A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins. , 2003, Journal of molecular biology.

[123] Dmitrij Frishman,et al. SNAPper: gene order predicts gene function , 2002, Bioinform..

[124] Jerry Eichler,et al. Poorly conserved ORFs in the genome of the archaea Halobacterium sp. NRC-1 correspond to expressed proteins , 2004, Bioinform..

[125] K. Chou,et al. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[126] Kuo-Chen Chou,et al. Predicting protein localization in budding Yeast , 2005, Bioinform..

[127] W. Kauzmann. Some factors in the interpretation of protein denaturation. , 1959, Advances in protein chemistry.

[128] K. Nakai,et al. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[129] B. Chait,et al. ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.

[130] G. Lubec,et al. Evidence for existence of thirty hypothetical proteins in rat brain , 2004, Proteome Science.

[131] Tim J. P. Hubbard,et al. SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[132] M. Kanehisa,et al. A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[133] D. N. Perkins,et al. Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[134] Kuo-Chen Chou. Insights from modeling three-dimensional structures of the human potassium and sodium channels. , 2004, Journal of proteome research.

[135] Peer Bork,et al. SMART 4.0: towards genomic data integration , 2004, Nucleic Acids Res..

[136] S. Bryant,et al. An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[137] Z. Huang,et al. Using complexity measure factor to predict protein subcellular location , 2005, Amino Acids.

[138] Kuo-Chen Chou,et al. Insights from modeling the tertiary structure of human BACE2. , 2004, Journal of proteome research.

[139] Amos Bairoch,et al. The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[140] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[141] B. V. Reddy. Structural distribution of dipeptides that are identified to be determinants of intracellular protein stability. , 1996, Journal of biomolecular structure & dynamics.

[142] Tim J. P. Hubbard,et al. SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[143] K. Chou,et al. Polyprotein cleavage mechanism of SARS CoV Mpro and chemical modification of the octapeptide , 2004, Peptides.

[144] Richard Bonneau,et al. Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[145] B. Rost. Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[146] Stephen J Benkovic,et al. FamClash: A method for ranking the activity of engineered enzymes , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[147] S. Balaji,et al. SUPFAM: A database of sequence superfamilies of protein domains , 2004, BMC Bioinformatics.

[148] R. Durbin,et al. Enhanced protein domain discovery by using language modeling techniques from speech recognition , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[149] Guo-Ping Zhou,et al. Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[150] D. Fischer,et al. Analysis of singleton ORFans in fully sequenced microbial genomes , 2003, Proteins.

[151] A G Murzin,et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[152] D. Lipman,et al. Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[153] T L Blundell,et al. CAMPASS: a database of structurally aligned protein superfamilies. , 1998, Structure.

[154] M. W. Pandit,et al. Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. , 1990, Protein engineering.

[155] Terri K. Attwood,et al. PRINTS-S: the database formerly known as PRINTS , 2000, Nucleic Acids Res..

[156] E V Koonin,et al. Estimating the number of protein folds and families from complete genome data. , 2000, Journal of molecular biology.

[157] K C Chou,et al. Prediction of protein structural classes and subcellular locations. , 2000, Current protein & peptide science.

[158] B. Rost. PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[159] K. Chou,et al. Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[160] Kuo-Chen Chou,et al. Insights from modelling the 3D structure of the extracellular domain of alpha7 nicotinic acetylcholine receptor. , 2004, Biochemical and biophysical research communications.

[161] P Berndt,et al. Reliable automatic protein identification from matrix‐assisted laser desorption/ionization mass spectrometric peptide fingerprints , 1999, Electrophoresis.

[162] Frances M. G. Pearl,et al. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis , 2004, Nucleic Acids Res..

[163] Steven E. Brenner,et al. Target selection for structural genomics , 2000, Nature Structural Biology.

[164] G. Lubec,et al. Detection of hypothetical proteins in 10 individual human tumor cell lines. , 2005, Biochimica et biophysica acta.

[165] M. James,et al. A critical assessment of comparative molecular modeling of tertiary structures of proteins * , 1995, Proteins.

[166] L. Shapiro,et al. Finding function through structural genomics. , 2000, Current opinion in biotechnology.

[167] Stella Veretnik,et al. Toward consistent assignment of structural domains in proteins. , 2004, Journal of molecular biology.

[168] Kuo-Chen Chou,et al. Virtual Screening for SARS-CoV Protease Based on KZ7088 Pharmacophore Points , 2004, J. Chem. Inf. Model..

[169] J. M. Sauder,et al. Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[170] Meng Wang,et al. SLLE for predicting membrane protein types. , 2005, Journal of theoretical biology.

[171] Sébastien Carrère,et al. The ProDom database of protein domain families: more emphasis on 3D , 2004, Nucleic Acids Res..