Domain Adaptation for Food Intake Classification with Teacher/Student Learning

Automatic dietary monitoring (ADM) stands as a challenging application in wearable healthcare technologies. In this paper, we define an ADM to perform food intake classification (FIC) over throat microphone recordings. We investigate the use of transfer learning to design an improved FIC system. Although labeled data with acoustic close-talk microphones are abundant, throat data is scarce. Therefore, we propose a new adaptation framework based on teacher/student learning. The teacher network is trained over high-quality acoustic microphone recordings, whereas the student network distills deep feature extraction capacity of the teacher over a parallel dataset. Our approach allows us to transfer the representational capacity, adds robustness to the resulting model, and improves the FIC through throat microphone recordings. The classification problem is formulated as a spectra-temporal sequence recognition using the Convolutional LSTM (ConvLSTM) models. We evaluate the proposed approach using a large scale acoustic dataset collected from online recordings, an in-house food intake throat microphone dataset, and a parallel speech dataset. The bidirectional ConvLSTM network with the proposed domain adaptation approach consistently outperforms the SVM- and CNN-based baseline methods and attains 85.2% accuracy for the classification of 10 different food intake items. This translates to 17.8% accuracy improvement with the proposed domain adaptation.