HAKKE: A Multi-Strategy Prediction System for Sequences

We developed a machine learning system HAKKE which is suitable for predicting functional regions from sequences, such as protein-coding region prediction, and transmembrane domain prediction. HAKKE is a hybrid system cooperated by a number of algorithms of a pool to make an accurate prediction. The system uses an extension of the weighted majority algorithm in order to t the strength of each algorithm into given training examples. In this paper, we describe the core of the system and show some experimental results on transmembrane domain and -helix predictions.

[1]  A. S. Pollitt The key role of classification and indexing in view-based searching , 1998 .

[2]  Louis B. Rosenfeld,et al.  Information architecture for the world wide web - designing large-scale web sites , 1998 .

[3]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[4]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Jakob Nielsen,et al.  SunWeb: User Interface Design for Sun Microsystem's Internal Web , 1995, Comput. Networks ISDN Syst..

[6]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[7]  Ayumi Shinohara,et al.  Knowledge Acquisition from Amino Acid Sequences by Machine Learning System BONSAI , 1992 .

[8]  Geoffrey P. Ellis,et al.  HIBROWSE for bibliographic databases , 1994, J. Inf. Sci..

[9]  B. Shneiderman,et al.  The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploration system , 1992, SIGIR '92.

[10]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[11]  Christopher Williamson,et al.  Dynamic queries for information exploration: an implementation and evaluation , 1992, CHI.

[12]  James R. Miller,et al.  Conference Companion on Human Factors in Computing Systems , 1995, CHI 1995.

[13]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[14]  Jakob Nielsen,et al.  Guerrilla HCI: using discount usability engineering to penetrate the intimidation barrier , 1994 .

[15]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[16]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[17]  Yin Xu,et al.  An Improved System for Exon Recognition and Gene Modeling in Human DNA Sequence , 1994, ISMB.