An innovative approach for testing bioinformatics programs using metamorphic testing

BackgroundRecent advances in experimental and computational technologies have fueled the development of many sophisticated bioinformatics programs. The correctness of such programs is crucial as incorrectly computed results may lead to wrong biological conclusion or misguide downstream experimentation. Common software testing procedures involve executing the target program with a set of test inputs and then verifying the correctness of the test outputs. However, due to the complexity of many bioinformatics programs, it is often difficult to verify the correctness of the test outputs. Therefore our ability to perform systematic software testing is greatly hindered.ResultsWe propose to use a novel software testing technique, metamorphic testing (MT), to test a range of bioinformatics programs. Instead of requiring a mechanism to verify whether an individual test output is correct, the MT technique verifies whether a pair of test outputs conform to a set of domain specific properties, called metamorphic relations (MRs), thus greatly increases the number and variety of test cases that can be applied. To demonstrate how MT is used in practice, we applied MT to test two open-source bioinformatics programs, namely GNLab and SeqMap. In particular we show that MT is simple to implement, and is effective in detecting faults in a real-life program and some artificially fault-seeded programs. Further, we discuss how MT can be applied to test programs from various domains of bioinformatics.ConclusionThis paper describes the application of a simple, effective and automated technique to systematically test a range of bioinformatics programs. We show how MT can be implemented in practice through two real-life case studies. Since many bioinformatics programs, particularly those for large scale simulation and data analysis, are hard to test systematically, their developers may benefit from using MT as part of the testing strategy. Therefore our work represents a significant step towards software reliability in bioinformatics.

[1]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[4]  Tsong Yueh Chen,et al.  Metamorphic testing of programs on partial differential equations: a case study , 2002, Proceedings 26th Annual International Computer Software and Applications.

[5]  Herbert M. Sauro,et al.  Bioinformatics Applications Note Comparing Simulation Results of Sbml Capable Simulators , 2022 .

[6]  Janet M. Thornton,et al.  Software engineering challenges in bioinformatics , 2004, Proceedings. 26th International Conference on Software Engineering.

[7]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[8]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[9]  Tsong Yueh Chen,et al.  Semi-proving: an integrated method based on global symbolic evaluation and metamorphic testing , 2002, ISSTA '02.

[10]  Arnaud Gotlieb,et al.  Automated metamorphic testing , 2003, Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003.

[11]  Michael A. Charleston,et al.  Differential variability analysis of gene expression and its application to human diseases , 2008, ISMB.

[12]  Ying Liu,et al.  Metamorphic Testing and Testing with Special Values , 2004, SNPD.

[13]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  A. Hill,et al.  The possible effects of the aggregation of the molecules of haemoglobin on its dissociation curves , 1910 .

[15]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[16]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..

[17]  A. Jefferson Offutt,et al.  Introduction to Software Testing , 2008 .

[18]  Tsong Yueh Chen,et al.  Fault-based testing without the need of oracles , 2003, Inf. Softw. Technol..

[19]  Hui Jiang,et al.  MADS: a new and improved method for analysis of differential alternative splicing by exon-tiling microarrays. , 2008, RNA.

[20]  P. Bourgine,et al.  Topological and causal structure of the yeast transcriptional regulatory network , 2002, Nature Genetics.

[21]  Tsong Yueh Chen,et al.  An effective testing method for end-user programmers , 2005, WEUSE@ICSE.

[22]  Robert Gentleman Bioinformatics Software Engineering , 2005 .

[23]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[24]  J. H. Hofmeyr,et al.  The reversible Hill equation: how to incorporate cooperative enzymes into metabolic models , 1997, Comput. Appl. Biosci..

[25]  Tsong Yueh Chen,et al.  Testing of Heuristic Methods: A Case Study of Greedy Algorithm , 2008, CEE-SET.

[26]  Gail E. Kaiser,et al.  Properties of Machine Learning Applications for Use in Metamorphic Testing , 2008, SEKE.

[27]  Boris Beizer,et al.  Software Testing Techniques , 1983 .

[28]  Peter D. Karp,et al.  EcoCyc: a comprehensive database resource for Escherichia coli , 2004, Nucleic Acids Res..

[29]  Paul Weston Bioinformatics Software Engineering: Delivering Effective Applications , 2004 .

[30]  Tsong Yueh Chen,et al.  Case studies on the selection of useful relations in metamorphic testing , 2004 .

[31]  Tsong Yueh Chen,et al.  An effective testing method for end-user programmers , 2005, ACM SIGSOFT Softw. Eng. Notes.

[32]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[33]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[34]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[35]  Luciano Baresi,et al.  An Introduction to Software Testing , 2006, FoVMT.

[36]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[37]  M. R. Woodward,et al.  From weak to strong, dead or alive? an analysis of some mutation testing issues , 1988, [1988] Proceedings. Second Workshop on Software Testing, Verification, and Analysis.

[38]  Darren J. Wilkinson,et al.  The SBML discrete stochastic models test suite , 2008, Bioinform..

[39]  Lionel C. Briand,et al.  Is mutation an appropriate tool for testing experiments? , 2005, ICSE.