Voice conversion based on deep neural network with multiple output sub-networks