Chip design of portable speech memopad suitable for persons with visual disabilities

This paper presents the design of a speech recognition and compression chip for portable memopad devices, especially suitable for use by the visually impaired. The proposed chip design is based on several cores of which they can be regarded as intellectual property (IP) cores to be used for a variety of speech-related application systems. A cepstrum extraction core and a dynamic warping core are designed for mapping the speech recognition algorithms. In the cepstrum extraction core, a novel architecture computes the autocorrelation between the overlapping frames using two pairs of shift registers and an intelligent accumulation procedure. The architecture of the dynamic time warping core uses only a single processing element, and is based on our extensive study of the relationship among the nodes in the dynamic time warping lattice. Bit rate is the key factor affecting the memory size for speech compression; therefore, a very low bit-rate speech coder is used. The speech coder exploits a line-spectrum-based interpolation method, which yields fine quality synthesized speech despite the low 1.6 kbps bit rate. The 1.6 kbps vocoder core is cost-effective, and it integrates both encoder and decoder algorithms. The proposed design has been tested via hardware simulations on Xilinx Virtex series FPGAs and a semi-custom chip fabricated by 0.35 /spl mu/m CMOS single-poly-four-metal technology on a die size approximately 4.46/spl times/4.46 mm/sup 2/.

[1]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[2]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[5]  Andreas Spanias,et al.  Cepstrum-based pitch detection using a new statistical V/UV classification algorithm , 1999, IEEE Trans. Speech Audio Process..

[6]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[7]  Joseph P. Campbell,et al.  Voiced/Unvoiced classification of speech with applications to the U.S. government LPC-10E algorithm , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Jhing-Fa Wang,et al.  Extraction of pitch information in noisy speech using wavelet transform with aliasing compensation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Jhing-Fa Wang,et al.  VLSI architecture and implementation for FS1016 CELP decoder with reduced power and memory requirements , 1997, Integr..

[10]  Jhing-Fa Wang,et al.  An ASIC design for linear predictive coding of speech signals , 1992, Proceedings Euro ASIC '92.

[11]  Biing-Hwang Juang,et al.  Line spectrum pair (LSP) and speech data compression , 1984, ICASSP.

[12]  David J. Goodman,et al.  Subjective quality of the same speech transmission conditions in seven different countries , 1982, ICASSP.

[13]  Keshab K. Parhi,et al.  VLSI digital signal processing systems , 1999 .