Efficient speech edge detection for mobile health applications

Intelligent audio sensors that are continuously recording and analyzing sounds are a critical component of many emerging and future embedded applications. In these applications, the power budget is very tight, of which the analog front end consumes a major proportion. An efficient analog front end should adapt its power consumption to the instantaneous bandwidth of the audio signal of interest, instead of constantly consuming a fixed amount of power that assumes a fixed signal bandwidth. In this paper, we introduce a novel algorithm for identifying the edges of speech in the time-frequency domain, which is used to detect the instantaneous bandwidth of speech. A circuit implementation of our algorithm consumes 42.4µW of power and can extract the instantaneous bandwidth of a signal within an accuracy of 1% even in SNR conditions as low as 10 dB.

[1]  Naveen Verma,et al.  Design considerations for ultra-low energy wireless microsensor nodes , 2005, IEEE Transactions on Computers.

[2]  Max A. Little,et al.  Accurate Telemonitoring of Parkinson's Disease Progression by Noninvasive Speech Tests , 2009, IEEE Transactions on Biomedical Engineering.

[3]  M. Sung,et al.  Objective physiological and behavioral measures for identifying and tracking depression state in clinically depressed patients , 2005 .

[4]  Farook Sattar,et al.  Automatic wheeze detection using histograms of sample entropy , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[5]  Ian H. Witten,et al.  The New Zealand Digital Library MELody inDEX , 1997, D Lib Mag..

[6]  Leslie S. Smith,et al.  Robust sound onset detection using leaky integrate-and-fire neurons with depressing synapses , 2004, IEEE Transactions on Neural Networks.

[7]  Fang Chen,et al.  Speech-based cognitive load monitoring system , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Oded Ghitza,et al.  Auditory nerve representation as a front-end for speech recognition in a noisy environment , 1986 .

[9]  Ton Dijkstra,et al.  Therapy progress indicator (TPI): Combining speech parameters and the subjective unit of distress , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[10]  Philip Lieberman,et al.  Mount Everest: a space analogue for speech monitoring of cognitive deficits and stress. , 2005, Aviation, space, and environmental medicine.

[11]  Yorgos Palaskas,et al.  Internally varying analog circuits minimize power dissipation , 2003 .

[12]  DeLiang Wang,et al.  Auditory Segmentation Based on Onset and Offset Analysis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Oded Ghitza Robustness against noise: The role of timing-synchrony measurement , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.