Audio-visual sensor fusion with neural architectures