BioInfoMark: A Bioinformatic Benchmark Suite for Computer Architecture Research

The exponential growth in the amount of genomic data has spurred growing interest in large scale analysis of genetic information. Bioinformatics applications, which explore computational methods to allow researchers to sift through the massive biological data and extract useful information, are becoming increasingly important computer workloads. This paper presents BioInfoMark, a benchmark suite of representative bioinformatics applications to facilitate the design and evaluation of computer architectures for these emerging workloads. Currently, the BioInfoMark suite contains 14 highly popular bioinformatics tools and covers the major fields of study in computational biology such as sequence comparison, phylogenetic analysis, protein structure analysis, and molecular dynamics simulation. The BioInfoMark package includes benchmark source code, input datasets and information for compiling and using the benchmarks. To allow computer architecture researchers to run the BioInfoMark suite on several popular execution driven simulators, we provide pre-compiled little-endian Alpha ISA binaries and generated simulation points. The BioInfoMark package is freely available and can be downloaded from: http://www.ideal.ece.ufl.edu/BioInfoMark.

[1]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[2]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[3]  Carl L. Hubbs,et al.  Fishes of the World. , 1978 .

[4]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[5]  Mark S. Gordon,et al.  General atomic and molecular electronic structure system , 1993, J. Comput. Chem..

[6]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[7]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[8]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[9]  Daniel A. Menascé Workload Characterization , 2003, IEEE Internet Comput..

[10]  Tao Li,et al.  Workload characterization of bioinformatics applications , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[11]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[12]  Donald Yeung,et al.  BioBench: A Benchmark Suite of Bioinformatics Applications , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[13]  Brad Calder,et al.  How to use SimPoint to pick simulation points , 2004, PERV.

[14]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[15]  N. O. Manning,et al.  The protein data bank , 1999, Genetica.