Data-Efficient Multimodal Fusion on a Single GPU