Sound source separation for a robot based on pitch

We present a novel method for the separation of monaurally recorded speech signals based on pitch. Our method is inspired by the ability of some auditory neurons to phase lock with the excitation signal. After applying a Gammatone filter-bank on the original signal we compare the distances between zero crossings of possible harmonics and decide upon the result of this comparison if they share the same fundamental and hence originate from the same sound source. For higher frequencies we use the amplitude modulation property of unresolved harmonics to determine their fundamental frequency. When comparing our method to standard autocorrelation based methods we see that the pitch can be tracked more precisely and especially opens the way to extract also the pitch contour of a second speaker or other sound sources which can be of importance for the robots behavior. Tests in sound source separation of our algorithm on a database with several speakers and a large set of intrusions show that our algorithm performs slightly better than the commonly used autocorrelation at lower computational costs.