There are may exist some common factors independent of languages and cultures in human perception of emotion via speech sounds. This study investigated the factors using subjects from Japan, the United States and China, all of whom have no experience living abroad. An emotional speech database sans linguistic information was used in this study and evaluated using 3- and 6-emotional dimensions. It was found that most speech materials were perceived to have multiple emotional components, even though a single emotion had been intended to be expressed by the speakers. As the listener’s evaluation of the intended emotion gets lower, the components of the other emotions were perceived more strongly. This phenomenon is common across the three cultures. The principle component analysis showed that the loading pattern of the explanatory variables was consistent with one another for the three different cultures at about a 67% cover rate. Extending the evaluation dimension from three emotions to six emotions, it was found that anger joy and sad may constitute three basic emotions, while the other emotions converge to those basic emotions with about 60% accuracy.