Building an effective corpus by using acoustic space visualization (COSMOS) method [speech recognition applications]

This paper proposes the technique of building an effective corpus with lower cost by using the method of visualizing multiple HMM acoustic models into a 2D space ("COSMOS" method: comprehensive space map of objective signal, previously acoustic space map of sound) method. In an experiment of this paper, adapted acoustic models of 533 male speakers are made with a small quantity of voice samples (10 words) per speaker. Then a plotted map (called COSMOS map) featuring a total of 533 male speakers is generated utilizing the COSMOS method. A corpus was built by selecting 200 male speakers located only in the periphery of the distribution in the COSMOS map and by collecting voice samples (165 words) per speaker. The acoustic model trained from the corpus showed higher performance than the one trained from other corpus built with 200 male speakers selected randomly in the COSMOS map or all of 533 male speakers in the COSMOS map.