System of microphone arrays and neural networks for robust speech recognition in multimedia environments