Learning Visual-to-Audio Mapping with Locally Linear Maps on Manifold Structure (abstract only)