Speaker Identification for Business-Card-Type Sensors

Human collaboration has a great impact on the performance of multi-person activities. The analysis of speaker information and speech timing can be used to extract human collaboration data in detail. Some studies have extracted human collaboration data by identifying a speaker with business-card-type sensors. However, it is difficult to realize speaker identification for business-card-type sensors at low cost and high accuracy because of spikes in the measured sound pressure data, ambient noise in the non-speaker sensor, and synchronization errors across each sensor. This study proposes a novel sound pressure sensor and speaker identification algorithm to realize speaker identification for business-card-type sensors. The sensor extracts the user's speech at low cost and high accuracy by employing a peak hold circuit and time synchronization module for spike mitigation and precise time synchronization. The algorithm identifies a speaker with high accuracy by removing ambient noise. The evaluations show that the algorithm accurately identifies a speaker in a multi-person activity considering varying numbers of users, environmental noises, and reverberation conditions as well as long or short utterances. In addition, the peak hold circuit enables accurate extraction of speech and the synchronization error between the sensors is always within $\pm$30 $\boldsymbol\mu$s, that is, negligible error.

[1]  Giorgio Biagetti,et al.  Speaker Identification in Noisy Conditions Using Short Sequences of Speech Frames , 2017, KES-IDT.

[2]  Douglas L. Jones,et al.  Localization of multiple acoustic sources with small arrays using a coherence test. , 2008, The Journal of the Acoustical Society of America.

[3]  Geyong Min,et al.  User Verification Based On Customized Sentence Reading , 2018, 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).

[4]  Christophe Rosenberger,et al.  Speaker Recognition for Mobile User Authentication: An Android Solution , 2013 .

[5]  Sunil Kumar Kopparapu,et al.  Novel windowing technique of MFCC for speaker identification with Modified Polynomial Classifiers , 2014, 2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence).

[6]  Shanthini Pandiaraj,et al.  A confidence measure based — Score fusion technique to integrate MFCC and Pitch for speaker verification , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[7]  DeLiang Wang,et al.  Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Guy J. Brown,et al.  Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions , 2015, INTERSPEECH.

[10]  Zhigang Cao,et al.  Improved MFCC-based feature for robust speaker identification , 2005 .

[11]  Moon-Seog Jun,et al.  Home IoT device certification through speaker recognition , 2015, 2015 17th International Conference on Advanced Communication Technology (ICACT).

[12]  Emmanuel Vincent,et al.  A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Abdelmalik Taleb-Ahmed,et al.  Robust remote speaker recognition system based on AR-MFCC features and efficient speech activity detection algorithm , 2014, 2014 11th International Symposium on Wireless Communications Systems (ISWCS).

[14]  Georges Quénot,et al.  Unsupervised Speaker Identification in TV Broadcast Based on Written Names , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[16]  Jean Rouat,et al.  Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering , 2007, Robotics Auton. Syst..

[17]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[18]  Bouziane Ayoub,et al.  An analysis and comparative evaluation of MFCC variants for speaker identification over VoIP networks , 2015, 2015 World Congress on Information Technology and Computer Applications (WCITCA).

[19]  Guy J. Brown,et al.  Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  François Michaud,et al.  Time difference of arrival estimation based on binary frequency mask for sound source localization on mobile robots , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  Alex 'Sandy' Pentland,et al.  Open Badges: A Low-Cost Toolkit for Measuring Team Communication and Dynamics , 2017, ArXiv.

[22]  DeLiang Wang,et al.  Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Tadahiro Kuroda,et al.  Speaker Recognition using Speaker-independent Universal Acoustic Model and Synchronous Sensing for Business Microscope , 2009, 2009 4th International Symposium on Wireless Pervasive Computing.

[24]  Tadahiro Kuroda,et al.  Hybrid Speaker Recognition Using Universal Acoustic Model , 2011 .

[25]  Tadahiro Kuroda,et al.  Speech "Siglet" Detection for Business Microscope (concise contribution) , 2008, 2008 Sixth Annual IEEE International Conference on Pervasive Computing and Communications (PerCom).

[26]  DeLiang Wang,et al.  Robust Speaker Identification in Noisy and Reverberant Conditions , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Rainer Martin,et al.  Binaural Speaker Localization Integrated Into an Adaptive Beamformer for Hearing Aids , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Jean Rouat,et al.  Robust sound source localization using a microphone array on a mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[29]  Bhaskar D. Rao,et al.  A Two Microphone-Based Approach for Source Localization of Multiple Speech Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  H.S. Jamadagni,et al.  VAD techniques for real-time speech transmission on the Internet , 2002, 5th IEEE International Conference on High Speed Networks and Multimedia Communication (Cat. No.02EX612).

[31]  Alex 'Sandy' Pentland,et al.  Rhythm: A Unified Measurement Platform for Human Organizations , 2018, IEEE MultiMedia.

[32]  Nikita P. Desai,et al.  Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter , 2014, 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE).

[33]  Longbiao Wang,et al.  Speaker Identification and Verification by Combining MFCC and Phase Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  S. S. Wali,et al.  MFCC Based Text-Dependent Speaker Identification Using BPNN , 2014 .

[35]  Raghunath S. Holambe,et al.  Speaker Identification Based on Robust AM-FM Features , 2009, 2009 Second International Conference on Emerging Trends in Engineering & Technology.

[36]  Sébastien Marcel,et al.  A Fast Parts-Based Approach to Speaker Verification Using Boosted Slice Classifiers , 2012, IEEE Transactions on Information Forensics and Security.

[37]  H. S. Jayanna,et al.  Efficient window for monolingual and crosslingual speaker identification using MFCC , 2013, 2013 International Conference on Advanced Computing and Communication Systems.

[38]  Arun Ross,et al.  Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals , 2020, IEEE Transactions on Information Forensics and Security.

[39]  Youji Iiguni,et al.  Noise robust speaker identification by dividing MFCC , 2014, 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[40]  Bin Ma,et al.  Text-dependent speaker verification: Classifiers, databases and RSR2015 , 2014, Speech Commun..

[41]  A. Roy,et al.  Fusion of a Complementary Feature Set with MFCC for Improved Closed Set Text-Independent Speaker Identification , 2006, 2006 IEEE International Conference on Industrial Technology.

[42]  Stefan B. Williams,et al.  Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Lothar Thiele,et al.  Efficient network flooding and time synchronization with Glossy , 2011, Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks.

[44]  Tomoki Toda,et al.  Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[45]  Israel Cohen,et al.  Dominant speaker identification for multipoint videoconferencing , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[46]  Giorgio Biagetti,et al.  Speaker Identification with Short Sequences of Speech Frames , 2015, ICPRAM.

[47]  Shuai Wang,et al.  Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[48]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[49]  Michele Scarpiniti,et al.  Text Independent Automatic Speaker Recognition System Using Mel-Frequency Cepstrum Coefficient and Gaussian Mixture Models , 2012, J. Information Security.

[50]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[51]  Danny Crookes,et al.  Speaker recognition in noisy conditions with limited training data , 2011, 2011 19th European Signal Processing Conference.