Automata Learning and Stochastic Modeling for Biosequence Analysis
暂无分享,去创建一个
[1] Nathan Linial,et al. ProtoMap: automatic classification of protein sequences and hierarchy of protein families , 2000, Nucleic Acids Res..
[2] Hanah Margalit,et al. PromEC: An updated database of Escherichia coli mRNA promoters with experimentally identified transcriptional start sites , 2001, Nucleic Acids Res..
[3] Frans M. J. Willems,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.
[4] W. Pearson. Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.
[5] Philip M. Lewis,et al. The characteristic selection problem in recognition systems , 1962, IRE Trans. Inf. Theory.
[6] Zukang Feng,et al. The Protein Data Bank and structural genomics , 2003, Nucleic Acids Res..
[7] W. Zander,et al. The Hebrew University , 1998 .
[8] Naftali Tishby,et al. Markovian domain fingerprinting: statistical segmentation of protein sequences , 2001, Bioinform..
[9] Alfred V. Aho,et al. Efficient string matching , 1975, Commun. ACM.
[10] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.
[11] Dana Ron,et al. The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.
[12] C. Branden,et al. Introduction to protein structure , 1991 .
[13] K. Yoshida,et al. Foldability of barnase mutants obtained by permutation of modules or secondary structure units. , 1999, Journal of molecular biology.
[14] W. Taylor,et al. The classification of amino acid conservation. , 1986, Journal of theoretical biology.
[15] Gill Bejerano. Efficient exact value computation and applications to biosequence analysis , 2003, RECOMB '03.
[16] Yoshua Bengio,et al. Markovian Models for Sequential Data , 2004 .
[17] M. F.,et al. Bibliography , 1985, Experimental Gerontology.
[18] L Holm,et al. Towards a covering set of protein family profiles. , 2000, Progress in biophysics and molecular biology.
[19] R. Glockshuber,et al. Random circular permutation of DsbA reveals segments that are essential for protein folding and stability. , 1999, Journal of molecular biology.
[20] Alberto Apostolico,et al. Optimal amnesic probabilistic automata or how to learn and classify proteins in linear time and space , 2000, RECOMB '00.
[21] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[22] Maria Jesus Martin,et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..
[23] Sean R. Eddy,et al. Profile hidden Markov models , 1998, Bioinform..
[24] Anton J. Enright,et al. An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.
[25] Stefano Toppo,et al. Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices , 2002, Bioinform..
[26] P. Bork,et al. Protein sequence motifs. , 1996, Current opinion in structural biology.
[27] Naftali Tishby,et al. Discriminative Feature Selection via Multiclass Variable Memory Markov Model , 2002, EURASIP J. Adv. Signal Process..
[28] Shmuel Pietrokovski,et al. Increased coverage of protein families with the Blocks Database servers , 2000, Nucleic Acids Res..
[29] Ron Unger,et al. Swaps in protein sequences , 2002, Proteins.
[30] David R. Gilbert,et al. Approaches to the Automatic Discovery of Patterns in Biosequences , 1998, J. Comput. Biol..
[31] Michael Sipser,et al. Inference and minimization of hidden Markov chains , 1994, COLT '94.
[32] Naftali Tishby,et al. Unsupervised Sequence Segmentation by a Mixture of Switching Variable Memory Markov Sources , 2001, ICML.
[33] Arne Elofsson,et al. A comparison of sequence and structure protein domain families as a basis for structural genomics , 1999, Bioinform..
[34] Sung-Hou Kim,et al. Electron transfer by domain movement in cytochrome bc1 , 1998, Nature.
[35] Vineet Bafna,et al. Pattern Matching Algorithms , 1997 .
[36] E T Stuart,et al. Mammalian Pax genes. , 1994, Annual review of genetics.
[37] Pierre Dupont,et al. Improved Smoothing for Probabilistic Suffix Trees Seen as Variable Order Markov Chains , 2002, ECML.
[38] Anton J. Enright,et al. GeneRAGE: a robust algorithm for sequence clustering and domain detection , 2000, Bioinform..
[39] D. Eisenberg,et al. Computational methods of analysis of protein-protein interactions. , 2003, Current opinion in structural biology.
[40] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[41] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.
[42] Alex Bateman,et al. The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..
[43] Raphail E. Krichevsky,et al. The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.
[44] Jorma Rissanen,et al. The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.
[45] Daniel Povey,et al. Large scale discriminative training for speech recognition , 2000 .
[46] N. Wicker,et al. Secator: a program for inferring protein subfamilies from phylogenetic trees. , 2001, Molecular biology and evolution.
[47] Tim J. P. Hubbard,et al. SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..
[48] A. Valencia,et al. Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.
[49] Thomas G. Dietterich,et al. Learning with Many Irrelevant Features , 1991, AAAI.
[50] Anton J. Enright,et al. Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.
[51] Burkhard Rost,et al. Domains, motifs and clusters in the protein universe. , 2003, Current opinion in chemical biology.
[52] Andreas Stolcke,et al. Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.
[53] S. Salzberg,et al. Interpolated Markov models for eukaryotic gene finding. , 1999, Genomics.
[54] Naoki Abe,et al. On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.
[55] M. A. Basharov. Cotranslational Folding of Proteins , 2004, Biochemistry (Moscow).
[56] Jérôme Gouzy,et al. Whole Genome Protein Domain Analysis using a New Method for Domain Clustering , 1999, Comput. Chem..
[57] Jun S. Liu,et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.
[58] Golan Yona,et al. Modeling protein families using probabilistic suffix trees , 1999, RECOMB.
[59] G. F. Hughes,et al. On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.
[60] Owen White,et al. The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..
[61] A. Fedorov,et al. Contribution of cotranslational folding to the rate of formation of native protein structure. , 1995, Proceedings of the National Academy of Sciences of the United States of America.
[62] S. Altschul. Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.
[63] A. Fersht,et al. Folding of circular and permuted chymotrypsin inhibitor 2: retention of the folding nucleus. , 1998, Biochemistry.
[64] Chris Sander,et al. Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..
[65] H. Margalit,et al. Novel small RNA-encoding genes in the intergenic regions of Escherichia coli , 2001, Current Biology.
[66] C Kulikowski,et al. Automatic discovery of sub-molecular sequence domains in multi-aligned sequences: a dynamic programming algorithm for multiple alignment segmentation. , 2000, Journal of theoretical biology.
[67] David L. Eaton,et al. Glutathione S‐transferases: Amino acid sequence comparison, classification and phylogenetic relationship , 1992 .
[68] J. Thompson,et al. Multiple sequence alignment with Clustal X. , 1998, Trends in biochemical sciences.
[69] Cathy H. Wu,et al. iProClass: an integrated database of protein family, function and structure information , 2003, Nucleic Acids Res..
[70] J. Thompson,et al. Using CLUSTAL for multiple sequence alignments. , 1996, Methods in enzymology.
[71] S. Henikoff,et al. Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.
[72] Sean R. Eddy,et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .
[73] A. D. McLachlan,et al. Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.
[74] A. C. May,et al. Definition of the tempo of sequence diversity across an alignment and automatic identification of sequence motifs: Application to protein homologous families and superfamilies , 2002, Protein science : a publication of the Protein Society.
[75] R. A. George,et al. Protein domain identification and improved sequence similarity searching using PSI‐BLAST , 2002, Proteins.
[76] Yoram Singer,et al. The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.
[77] Jiye Shi,et al. HOMSTRAD: adding sequence information to structure-based alignments of homologous protein families , 2001, Bioinform..
[78] P. Bork,et al. Protein domain analysis in the era of complete genomes , 2002, FEBS letters.
[79] C. Ponting,et al. On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? , 2001, Journal of structural biology.
[80] C Ouzounis,et al. Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins , 1999, Proteins.
[81] Jeffrey E. F. Friedl. Mastering Regular Expressions , 1997 .
[82] Burkhard Rost,et al. Target space for structural genomics revisited , 2002, Bioinform..
[83] Padhraic Smyth,et al. Decision tree design from a communication theory standpoint , 1988, IEEE Trans. Inf. Theory.
[84] KharHengChoo,et al. Recent Applications of Hidden Markov Models in Computational Biology , 2004 .
[85] Walter R. Gilks,et al. Modeling the percolation of annotation errors in a database of protein sequences , 2002, Bioinform..
[86] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[87] Gill Bejerano. Algorithms for variable length Markov chain modeling , 2004, Bioinform..
[88] William M. Campbell,et al. Mutual Information in Learning Feature Transformations , 2000, ICML.
[89] Jérôme Gouzy,et al. ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons , 2000, Nucleic Acids Res..
[90] Sean R. Eddy,et al. HMMER User's Guide - Biological sequence analysis using profile hidden Markov models , 1998 .
[91] P Argos,et al. DOMO: a new database of aligned protein domains. , 1998, Trends in biochemical sciences.
[92] J. Parker. Amino Acid Substitution , 2001 .
[93] Golan Yona,et al. Variations on probabilistic suffix trees: statistical modeling and prediction of protein families , 2001, Bioinform..
[94] Sam Griffiths-Jones,et al. The use of structure information to increase alignment accuracy does not aid homologue detection with profile HMMs , 2002, Bioinform..
[95] Charles Elkan,et al. Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.
[96] Stephen H. Bryant,et al. Domain size distributions can predict domain boundaries , 2000, Bioinform..
[97] D. Haussler,et al. Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.
[98] Ran El-Yaniv,et al. Agnostic Classification of Markovian Sequences , 1997, NIPS.
[99] Ori Sasson,et al. ProtoNet: hierarchical classification of the protein space , 2003, Nucleic Acids Res..
[100] M. Grossmann,et al. G Protein-coupled Receptors , 1998, The Journal of Biological Chemistry.
[101] Amos Bairoch,et al. PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..
[102] R. A. George,et al. Snapdragon: a Method to Delineate Protein Structural Domains from Sequence Data , 2022 .
[103] A. Valencia,et al. Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.
[104] James E. Bray,et al. The CATH database: an extended protein family resource for structural and functional genomics , 2003, Nucleic Acids Res..
[105] C. Reynolds,et al. Correlated mutations amongst the external residues of G-protein coupled receptors. , 1997, Biochemical Society transactions.
[106] C. Mcwherter,et al. Circular permutation of granulocyte colony-stimulating factor. , 1999, Biochemistry.
[107] Eleazar Eskin,et al. Protein Family Classification Using Sparse Markov Transducers , 2000, ISMB.
[108] C. Chothia,et al. The geometry of domain combination in proteins. , 2002, Journal of molecular biology.
[109] H A Scheraga,et al. Lattice neural network minimization. Application of neural network optimization for locating the global-minimum conformations of proteins. , 1993, Journal of molecular biology.
[110] Edward M. McCreight,et al. A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.
[111] David Haussler,et al. What Size Net Gives Valid Generalization? , 1989, Neural Computation.
[112] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[113] David Haussler,et al. The Smallest Automaton Recognizing the Subwords of a Text , 1985, Theor. Comput. Sci..
[114] R. Russell,et al. Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.
[115] J B Hurley,et al. Two amino acid substitutions convert a guanylyl cyclase, RetGC-1, into an adenylyl cyclase. , 1998, Proceedings of the National Academy of Sciences of the United States of America.
[116] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..
[117] J. Moody,et al. Feature Selection Based on Joint Mutual Information , 1999 .
[118] W R Pearson,et al. Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.
[119] Shlomo Dubnov,et al. Using Machine-Learning Methods for Musical Style Modeling , 2003, Computer.
[120] Ronitt Rubinfeld,et al. On the learnability of discrete distributions , 1994, STOC '94.
[121] P. Bühlmann,et al. Variable Length Markov Chains: Methodology, Computing, and Software , 2004 .
[122] J A Epstein,et al. Crystal structure of the human Pax6 paired domain-DNA complex reveals specific roles for the linker region and carboxy-terminal subdomain in DNA binding. , 1999, Genes & development.
[123] Nir Friedman,et al. A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites , 2001, WABI.
[124] Naftali Tishby,et al. Efficient Exact p-Value Computation for Small Sample, Sparse, and Surprising Categorical Data , 2004, J. Comput. Biol..
[125] Susan T. Dumais,et al. Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.
[126] Martin Vingron,et al. The SYSTERS protein sequence cluster set , 2000, Nucleic Acids Res..
[127] T. N. Bhat,et al. The Protein Data Bank , 2000, Nucleic Acids Res..
[128] J Roca,et al. The mechanisms of DNA topoisomerases. , 1995, Trends in biochemical sciences.
[129] Peter Bühlmann,et al. Model Selection for Variable Length Markov Chains and Tuning the Context Algorithm , 2000 .
[130] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[131] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .
[132] Peter Weiner,et al. Linear Pattern Matching Algorithms , 1973, SWAT.
[133] J. Hayes,et al. The glutathione S-transferase supergene family: regulation of GST and the contribution of the isoenzymes to cancer chemoprotection and drug resistance. , 1995, Critical reviews in biochemistry and molecular biology.
[134] M. Gerstein,et al. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.
[135] Esko Ukkonen,et al. On-line construction of suffix trees , 1995, Algorithmica.
[136] James E. Johnson,et al. MetaFam: a unified classification of protein families. II. Schema and query capabilities , 2001, Bioinform..
[137] D T Jones,et al. A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.
[138] Terri K. Attwood,et al. PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..
[139] Stefano Lonardi,et al. Efficient Detection of Unusual Words , 2000, J. Comput. Biol..
[140] Peer Bork,et al. Recent improvements to the SMART domain-based sequence annotation resource , 2002, Nucleic Acids Res..
[141] Dana Angluin,et al. Learning Markov chains with variable memory length from noisy output , 1997, COLT '97.
[142] Liisa Holm,et al. Picasso: generating a covering set of protein family profiles , 2001, Bioinform..
[143] Roberto Battiti,et al. Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.
[144] G. Barrows,et al. A mutual information measure for feature selection with application to pulse classification , 1996, Proceedings of Third International Symposium on Time-Frequency and Time-Scale Analysis (TFTS-96).
[145] Mikhail A. Roytberg,et al. Segmentation of long genomic sequences into domains with homogeneous composition with BASIO software , 2001, Bioinform..
[146] Rolf Apweiler,et al. Improvements to CluSTr: the database of SWISS-PROT+TrEMBL protein clusters , 2003, Nucleic Acids Res..
[147] Jérôme Gracy,et al. Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment , 1998, Bioinform..
[148] S. Henikoff,et al. Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.
[149] Renato De Mori,et al. High-performance connected digit recognition using maximum mutual information estimation , 1994, IEEE Trans. Speech Audio Process..
[150] Jorja G. Henikoff,et al. Using substitution probabilities to improve position-specific scoring matrices , 1996, Comput. Appl. Biosci..
[151] Anders Krogh,et al. SAM: SEQUENCE ALIGNMENT AND MODELING SOFTWARE SYSTEM , 1995 .
[152] JORMA RISSANEN,et al. A universal data compression system , 1983, IEEE Trans. Inf. Theory.
[153] David Haussler,et al. Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology , 1996, Comput. Appl. Biosci..
[154] K. Rose. Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.
[155] Imre Csiszár,et al. On the computation of rate-distortion functions (Corresp.) , 1974, IEEE Trans. Inf. Theory.
[156] D. Eisenberg,et al. Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.
[157] A. Mees,et al. Context-tree modeling of observed symbolic dynamics. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.
[158] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[159] William Noble Grundy,et al. Meta-MEME: motif-based hidden Markov models of protein families , 1997, Comput. Appl. Biosci..
[160] Jorma Rissanen,et al. Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.
[161] Dana Ron,et al. Learning to model sequences generated by switching distributions , 1995, COLT '95.
[162] L. Wu,et al. Autonomous protein folding units. , 2000, Advances in protein chemistry.
[163] Golan Yona,et al. Towards a Complete Map of the Protein Space Based on a Unified Sequence and Structure Analysis of All Known Proteins , 2000, ISMB.