Collaborative personal speaker identification: A generalized approach

This paper introduces a collaborative personal speaker identification system to annotate conversations and meetings using speech-independent speaker modeling and one audio channel. This system can operate in standalone and collaborative modes, and learn about speakers online that were detected as unknown. In collaborative mode, the system exchanges current speaker information with personal systems of others to improve identification performance. Our collaboration concept is based on distributed personal systems only, hence it does not require a specific infrastructure to operate. We present a generalized description of collaboration situations and derive three use scenarios in which the system was subsequently evaluated. Compared to standalone operation, collaboration among four personal identification systems increased system performance by up to 9% for 4 relevant speakers and up to 21% for 24 relevant speakers. Allowing unknown speakers in a conversation did not impede performance gains of a collaboration. In a scenario where individual systems had nonidentical speaker sets, collaboration gains were 16% for 24 relevant speakers.

[1]  Yong Rui,et al.  Real-time speaker tracking using particle filter sensor fusion , 2004, Proceedings of the IEEE.

[2]  Tanzeem Choudhury,et al.  The Sociometer: A Wearable Device for Understanding Human Networks , 2002 .

[3]  Yu-Wei Su,et al.  A Comparative Study of Wireless Protocols: Bluetooth, UWB, ZigBee, and Wi-Fi , 2007, IECON 2007 - 33rd Annual Conference of the IEEE Industrial Electronics Society.

[4]  Oliver Brdiczka,et al.  Predicting shoppers' interest from social interactions using sociometric sensors , 2009, CHI Extended Abstracts.

[5]  Alex Pentland,et al.  InSense: Interest-Based Life Logging , 2006, IEEE MultiMedia.

[6]  Alex Pentland,et al.  Capturing Individual and Group Behavior with Wearable Sensors , 2009, AAAI Spring Symposium: Human Behavior Modeling.

[7]  Urs Anliker Speaker separation and tracking , 2005 .

[8]  Lie Lu,et al.  Speaker change detection and tracking in real-time news broadcasting analysis , 2002, MULTIMEDIA '02.

[9]  Robert I. Damper,et al.  Impostor cohort selection for score normalisation in speaker verification , 1997, Pattern Recognit. Lett..

[10]  Daben Liu,et al.  Online speaker clustering , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Delphine Charlet Speaker indexing for retrieval of voicemail messages , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Sadaoki Furui,et al.  Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Gordon Bell,et al.  MyLifeBits: a personal database for everything , 2006, CACM.

[14]  Shrikanth S. Narayanan,et al.  A method for on-line speaker indexing using generic reference models , 2003, INTERSPEECH.

[15]  Aladdin M. Ariyaeeinia,et al.  Verification effectiveness in open-set speaker identification , 2006 .

[16]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[17]  Jean-Marc Odobez,et al.  Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[19]  Paul Lukowicz,et al.  Wearable sensing to annotate meeting recordings , 2002, Proceedings. Sixth International Symposium on Wearable Computers,.

[20]  Gerhard Tröster,et al.  Collaborative real-time speaker identification for wearable systems , 2010, 2010 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[21]  Alex Pentland,et al.  Sensing and modeling human networks , 2004 .