Developing a Meta Framework for Key-Value Memory Networks on HPC Clusters

We introduce a novel framework, DARE-MetaQA, enabling large-scale Question Answering (QA). By enhancing the Key-Value Memory Network (KV-MemNN) model, our framework is capable of overcoming inherent challenges of the baseline model in two specific aspects: (1) prediction performance and (2) computational scalability. The overall architecture aims to support a meta-learning strategy, for which multiple learning models are trained together and used for the inference. The meta framework is highly advantageous for developing a robust learning model, which is suitable for complicated nature of machine reasoning attempted by memory augmented network models. For the required computational scalability, our framework is designed to drive an ensemble of training models and multiple inference agents dynamically, leveraging parallel and distributed task and memory management. In this work, we focus on developing an optimized implementation of a computing environment for multi-node systems, specifically with the two high-end cluster systems (XSEDE Comet and Bridges) which are equipped with multiple GPUs along with a local runtime environment using an HPC container (Singularity). We highlight our main implementation decisions and achievements during the development, while discussing theoretical and technical underpinnings for the framework architecture and future works.