On the Advantages of Multi-Input Single-Output Parallel Cascade Classifiers

Parallel Cascade Identification (PCI) has been successfully applied to build dynamic nonlinear systems that address diverse challenges in the field of bioinformatics. PCI may be used to identify either single-input single-output (SISO) or multi-input single-output (MISO) models. Although SISO PCI models have typically sufficed, it has been suggested that MISO PCI systems could also be used to form bioinformatics classifiers, and indeed they were successfully applied in one study. This paper reports on the first systematic comparison of MISO and SISO PCI classifiers. Motivation for using the MISO structure is given. The construction of MISO parallel cascade models is also briefly reviewed. In order to compare the accuracy of SISO and MISO PCI classifiers, genetic algorithms are applied to optimize the model architecture on a number of equivalent single-input and multi-input biological training datasets. Through evaluation of both model structures on independent test datasets, we establish that MISO PCI is capable of building classifiers of equal accuracy to those resulting from SISO PCI models. Moreover, we discuss and illustrate the benefits of the MISO approach, including significant reduction in training and testing times, and the ability to adjust automatically the weighting of individual inputs according to information content.

[1]  L. Hood,et al.  The complete 685-kilobase DNA sequence of the human beta T cell receptor locus. , 1996, Science.

[2]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[3]  L. Hood,et al.  The Complete 685-Kilobase DNA Sequence of the Human β T Cell Receptor Locus , 1996, Science.

[4]  Michael J. Korenberg,et al.  Iterative fast orthogonal search algorithm for MDL-based training of generalized single-layer networks , 2000, Neural Networks.

[5]  E. A. Cheever,et al.  Using signal processing techniques for DNA sequence comparison , 1989, Proceedings of the Fifteenth Annual Northeast Bioengineering Conference.

[6]  James R. Green,et al.  Recognition of Adenosine Triphosphate Binding Sites Using Parallel Cascade System Identification , 2003, Annals of Biomedical Engineering.

[7]  Robert E. Dorsey,et al.  Genetic algorithms for estimation problems with multiple optima , 1995 .

[8]  Michael J. Korenberg,et al.  Parallel cascade identification and kernel estimation for nonlinear systems , 2006, Annals of Biomedical Engineering.

[9]  Ian W. Hunter,et al.  Automatic Classification of Protein Sequences into Structure/Function Groups via Parallel Cascade Identification: A Feasibility Study , 2000, Annals of Biomedical Engineering.

[10]  Michael J. Korenberg,et al.  Parallel cascade identification as a means for automatically classifying protein sequences into structure/function groups , 2000, Biological Cybernetics.

[11]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[12]  William R Taylor,et al.  Amino acid encoding schemes from protein structure alignments: multi-dimensional vectors to describe residue types. , 2002, Journal of theoretical biology.

[13]  A. S. French,et al.  Shaker K+ channels contribute early nonlinear amplification to the light response in Drosophila photoreceptors. , 2003, Journal of neurophysiology.

[14]  S. Billings,et al.  Algorithms for minimal model structure detection in nonlinear dynamic system identification , 1997 .

[15]  Edward D. Lipson,et al.  Parallel Cascade Recognition of Exon and Intron DNA Sequences , 2004, Annals of Biomedical Engineering.