BugIdentifier: An Approach to Identifying Bugs via Log Mining for Accelerating Bug Reporting Stage

Bugs severely damage the reliability of open source software. In order to improve the reliability of open source software, bug tracking system is built to collect and manage bugs reported from users all over the world. When system failures occur, users investigate whether failures are induced by software bugs and then report bugs. However, it is usually difficult and time consuming to identify bugs from system failures. To accelerate bug reporting and reduce the time users spend on identifying bugs, we present BugIdentifier, an automatic bug identifying approach based on log mining. BugIdentifier combines Doc2Vec with Deep Neural Network (DNN) and treats bug identifying as a binary classification problem. Doc2Vec is adopted to train a log sequence embedding model that transforms log sequences into feature vectors, and then DNN is used to identify whether the log sequence is bug-induced or not. The results of our empirical evaluation show that our approach can automatically identify real-world bugs of Hadoop and OpenStack with the F1-score higher than 75%, specifically, old-version bugs of OpenStack can be identified with 97% F1-score, as a result, bug reporting can be accelerated correspondingly.