Collection and analysis of spontaneous and read corpora for spoken language system development