Black-Box Testing of Deep Neural Networks

Several test adequacy criteria have been developed for quantifying the the coverage of deep neural networks (DNNs) achieved by a test suite. Being dependent on the structure of the DNN, these can be costly to measure and use, especially given the highly iterative nature of the model training workflow. Further, testing provides higher overall assurance when such implementation dependent measures are used along with implementation independent ones. In this paper, we rigorously define a new black-box coverage criterion that is independent of the DNN model under test. We further describe a few desirable properties and associated evaluation metrics for assessing test coverage criteria and use those to empirically compare and contrast the black-box criterion with several DNN structural coverage criteria. Results indicate that the black-box criterion has comparable effectiveness and provides benefits that complement white-box criteria. The results also reveal a few weaknesses of coverage criteria for DNNs.