Using deep gated RNN with a convolutional front end for end-to-end classification of heart sound

Classification of heart sounds of a diverse set of phono-cardiograms (PCGs) from different recording settings is the challenging objective of the 2016 PhysioNet Challenge. We suggest an end-to-end deep neural network, which is fed with raw PCGs and which learns to autonomously extract features and to classify the recordings. Our architecture combines convolutional and recurrent layers, followed by an attention mechanism, which weights time steps by importance and a dense multilayer perceptron as classifier. Whereas currently trending deep neural networks in speech recognition or computer vision use up to a million of training samples, a restricted set of only 3,153 heart sound recordings is available as training data. We workaround this limitation by artificially increasing the training set by means of augmentation of the raw PCGs using various audio effects. Using this moderately sized neural network, we attain high validation scores of 0.89 on validation data; however the resulting scores on the hidden test data ofthe challenge diverge in range (0.82).