论文信息 - Robust algorithms for speech reconstruction on mobile devices

Robust algorithms for speech reconstruction on mobile devices

An optical data storage system is contemplated, one employing a data-modulated writing laser beam and a non-erasing reading laser beam of predetermined wavelength. Improved optical media for such systems are described, these characterized by multiple layers whose optical characteristics and thickness are chosen to accommodate a prescribed writing and reading energy and wavelength and so provide an "anti-reflection" condition for unrecorded portions of the medium and a relatively higher reflectivity for recorded portions. A preferred optical medium includes a highly reflective aluminum layer, a relatively transparent polymer spacer layer overlying the reflective layer, and an optical absorber (recording) layer overlying the spacer layer. Overcoating structure is specified in some detail; e.g., as a "soft pad" layer (e.g., fluoropolymer) on the absorber, with a "Hard" layer (e.g., radiation-cured acrylic) laid over the "soft pad" as an outer protective overcoat. Also, the "spacer" may be rendered as such a "soft pad".

Xu Shao | Xu Shao

[1] Masanobu Abe,et al. A Text-to-Speech System with Transformation of Spectrum Envelope according to Fundamental Frequency , 1997 .

[2] Khalid Sayood,et al. Introduction to Data Compression , 1996 .

[3] Roy D. Patterson,et al. SVOS final report : The auditory filterbank , 1988 .

[4] J. Markel,et al. The SIFT algorithm for fundamental frequency estimation , 1972 .

[5] J. Pickles. An Introduction to the Physiology of Hearing , 1982 .

[6] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[7] John G. Proakis,et al. Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[8] Jean Rouat,et al. A pitch determination and voiced/unvoiced decision algorithm for noisy speech , 1995, Speech Commun..

[9] Tenkasi Ramabadran,et al. Enhancing distributed speech recognition with back- end speech reconstruction , 2001, INTERSPEECH.

[10] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[11] James F. Kaiser,et al. Some useful properties of Teager's energy operators , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] G. Békésy. Description of Some Mechanical Properties of the Organ of Corti , 1953 .

[13] Shigeki Sagayama,et al. Multiple-regression hidden Markov model , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14] Xu Shao,et al. Pitch prediction from MFCC vectors for speech reconstruction , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15] Roger C. F. Tucker,et al. Compression of acoustic features - are perceptual quality and recognition performance incompatible goals? , 1999, EUROSPEECH.

[16] Taoufik En-Najjary,et al. A new method for pitch prediction from spectral envelope and its application in voice conversion , 2003, INTERSPEECH.

[17] Guy J. Brown,et al. A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18] Malcolm Slaney,et al. An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[19] T. Hirahara. On the role of the fundamental frequency in vowel perception , 1988 .

[20] Xu Shao,et al. Low bit-rate feature vector compression using transform coding and non-uniform bit allocation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21] Alvin F. Martin,et al. The NIST 1999 Speaker Recognition Evaluation - An Overview , 2000, Digit. Signal Process..

[22] M. Symons. An introduction to probability, decision, and inference , 1971 .

[23] Stephen J. Cox,et al. Integrated pitch and MFCC extraction for speech reconstruction and speech recognition applications , 2003, INTERSPEECH.

[24] Xu Shao,et al. Transform-based feature vector compression for distributed speech recognition , 2002, INTERSPEECH.

[25] S. Seneff. A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[26] Stephanie Seneff,et al. A computational model for the peripheral auditory system: Application of speech recognition research , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27] Thomas F. Quatieri,et al. Pitch estimation and voicing detection based on a sinusoidal speech model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[28] G. McLachlan,et al. The EM algorithm and extensions , 1996 .

[29] Chao Huang,et al. Large vocabulary Mandarin speech recognition with different approaches in modeling tones , 2000, INTERSPEECH.

[30] C. W. Therrien,et al. Decision, Estimation and Classification: An Introduction to Pattern Recognition and Related Topics , 1989 .

[31] Harald Singer,et al. Pitch dependent phone modelling for HMM based speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32] Ben P. Milner,et al. Robust speech recognition over mobile and IP networks in burst-like packet loss , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[33] R. J. McAulay,et al. Speech transformations based on a sinusoidal representation , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34] Hong Kook Kim,et al. A bitstream-based front-end for wireless speech recognition on IS-136 communications system , 2001, IEEE Trans. Speech Audio Process..

[35] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[36] E. Owens,et al. An Introduction to the Psychology of Hearing , 1997 .

[37] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[38] Yannis Stylianou,et al. Stochastic modeling of spectral adjustment for high quality pitch modification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[39] Philippe Martin. Comparison of pitch detection by cepstrum and spectral comb analysis , 1982, ICASSP.

[40] Jr. G. Forney,et al. The viterbi algorithm , 1973 .

[41] Meir Tzur,et al. Efficient periodicity extraction based on sine-wave representation and its application to pitch determination of speech signals , 2001, INTERSPEECH.

[42] J P Martens,et al. Pitch and voiced/unvoiced determination with an auditory model. , 1992, The Journal of the Acoustical Society of America.

[43] Richard F. Lyon,et al. A perceptual pitch detector , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[44] William E Brownell. HOW THE EAR WORKS - NATURE'S SOLUTIONS FOR LISTENING. , 1997, The Volta review.

[45] Stephan Euler,et al. The influence of speech coding algorithms on automatic speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[46] Kuldip K. Paliwal,et al. Effect of speech coders on speech recognition performance , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[47] Abeer Alwan,et al. Modeling auditory perception for robust speech recognition , 1998 .

[48] Ahmet M. Kondoz,et al. Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[49] Charles W. Therrien,et al. Discrete Random Signals and Statistical Signal Processing , 1992 .

[50] B. Sankur,et al. Applications of Walsh and related functions , 1986 .

[51] Richard M. Stern,et al. Speech recognition from GSM codec parameters , 1998, ICSLP.

[52] Qin Yan,et al. Predicting formant frequencies from MFCC vectors [speech recognition applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[53] Xu Shao,et al. MAP prediction of pitch from MFCC vectors for speech reconstruction , 2004, INTERSPEECH.

[54] Guy J. Brown,et al. Computational auditory scene analysis , 1994, Comput. Speech Lang..

[55] Jonathan Darch,et al. Formant prediction from MFCC vectors , 2004 .

[56] Brian R Glasberg,et al. Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[57] Vassilios Digalakis,et al. Quantization of cepstral parameters for speech recognition over the World Wide Web , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[58] Bhiksha Raj,et al. Distributed speech recognition with codec parameters , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[59] Xu Shao,et al. Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model , 2002, INTERSPEECH.

[60] Thomas F. Quatieri,et al. Mixed-phase deconvolution of speech based on a sine-wave model , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[61] Saeed Vaseghi,et al. Noise compensation methods for hidden Markov model speech recognition in adverse environments , 1997, IEEE Trans. Speech Audio Process..

[62] Thomas F. Quatieri,et al. Phase coherence in speech reconstruction for enhancement and coding applications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[63] Ann K. Syrdal,et al. Vowel F1 as a function of speaker fundamental frequency , 1985 .

[64] Piero Cosi,et al. Auditory modeling techniques for robust pitch extraction and noise reduction , 1998, ICSLP.

[65] Andreas Spanias,et al. Speech coding: a tutorial review , 1994, Proc. IEEE.

[66] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..