Acoustic Forest for SMAP-Based Speaker Verification

In speaker verification, structural maximum-a-posteriori (SMAP) adaptation for Gaussian mixture model (GMM) has been proven effective, especially when the speech segment is very short. In SMAP adaptation, an acoustic tree of Gaussian components is constructed to represent the hierarchical acoustic space. Until now, however, there has been no clear way to automatically find the optimal tree structure for a given speaker. In this paper, we propose using an acoustic forest, which is a set of trees, for SMAP adaptation, instead of a single tree. In this approach, we combine the results of SMAP adaptation systems with different acoustic trees. A key issue is how to combine the trees. We explore three score fusion techniques, and evaluate our approach in the text-independent speaker verification task of the NIST 2006 SRE plan using 10-second speech segments. Our proposed method decreased EER by 3.2% from the relevant MAP adaptation and by 1.6% from the conventional SMAP with a single tree.