An Open Benchmark Suite for Evaluating Computer Architecture on Bioinformatics and Life Science Applications

In this paper, we propose BIOPERF, a definitive benchmark suite of representative applications from the biology and life sciences community, where the codes are carefully selected to span a breadth of algorithms and performance char- acteristics. The BIOPERF suite is available from www.bioperf. org and includes benchmark source code, input datasets of various sizes, and information for compiling and using the benchmarks. We include parallel codes where available. I. INTRODUCTION

[1]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[2]  Eugene W. Myers,et al.  Algorithms for whole genome shotgun sequencing , 1999, RECOMB.

[3]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[4]  正木 茂夫,et al.  DNA Data Bank of Japan(DDBJ)利用初心者講習会印象記 , 1988 .

[5]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  David A. Bader,et al.  Incorporating life sciences applications in the architectural optimizations of next-generation petaflop-system , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[8]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[9]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[10]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[11]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[12]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[13]  J. Weber,et al.  Human whole-genome shotgun sequencing. , 1997, Genome research.

[14]  Gagan Goel,et al.  BioSPLASH: A Sample Workload For Bioinformatics And Computational Biology For Optimizing Next - Gen , 2005 .

[15]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[16]  David A. Bader Computational biology and high-performance computing , 2004, CACM.

[17]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[18]  Donald Yeung,et al.  BioBench: A Benchmark Suite of Bioinformatics Applications , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[19]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[20]  David A. Bader,et al.  BioPerf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[21]  Breakpoint Phylogenies. , 1997, Genome informatics. Workshop on Genome Informatics.

[22]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[23]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, WADS.