Robust algorithms for speech reconstruction on mobile devices

An optical data storage system is contemplated, one employing a data-modulated writing laser beam and a non-erasing reading laser beam of predetermined wavelength. Improved optical media for such systems are described, these characterized by multiple layers whose optical characteristics and thickness are chosen to accommodate a prescribed writing and reading energy and wavelength and so provide an "anti-reflection" condition for unrecorded portions of the medium and a relatively higher reflectivity for recorded portions. A preferred optical medium includes a highly reflective aluminum layer, a relatively transparent polymer spacer layer overlying the reflective layer, and an optical absorber (recording) layer overlying the spacer layer. Overcoating structure is specified in some detail; e.g., as a "soft pad" layer (e.g., fluoropolymer) on the absorber, with a "Hard" layer (e.g., radiation-cured acrylic) laid over the "soft pad" as an outer protective overcoat. Also, the "spacer" may be rendered as such a "soft pad".

[1]  Masanobu Abe,et al.  A Text-to-Speech System with Transformation of Spectrum Envelope according to Fundamental Frequency , 1997 .

[2]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[3]  Roy D. Patterson,et al.  SVOS final report : The auditory filterbank , 1988 .

[4]  J. Markel,et al.  The SIFT algorithm for fundamental frequency estimation , 1972 .

[5]  J. Pickles An Introduction to the Physiology of Hearing , 1982 .

[6]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[7]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[8]  Jean Rouat,et al.  A pitch determination and voiced/unvoiced decision algorithm for noisy speech , 1995, Speech Commun..

[9]  Tenkasi Ramabadran,et al.  Enhancing distributed speech recognition with back- end speech reconstruction , 2001, INTERSPEECH.

[10]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[11]  James F. Kaiser,et al.  Some useful properties of Teager's energy operators , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  G. Békésy Description of Some Mechanical Properties of the Organ of Corti , 1953 .

[13]  Shigeki Sagayama,et al.  Multiple-regression hidden Markov model , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Xu Shao,et al.  Pitch prediction from MFCC vectors for speech reconstruction , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Roger C. F. Tucker,et al.  Compression of acoustic features - are perceptual quality and recognition performance incompatible goals? , 1999, EUROSPEECH.

[16]  Taoufik En-Najjary,et al.  A new method for pitch prediction from spectral envelope and its application in voice conversion , 2003, INTERSPEECH.

[17]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[19]  T. Hirahara On the role of the fundamental frequency in vowel perception , 1988 .

[20]  Xu Shao,et al.  Low bit-rate feature vector compression using transform coding and non-uniform bit allocation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21]  Alvin F. Martin,et al.  The NIST 1999 Speaker Recognition Evaluation - An Overview , 2000, Digit. Signal Process..

[22]  M. Symons An introduction to probability, decision, and inference , 1971 .

[23]  Stephen J. Cox,et al.  Integrated pitch and MFCC extraction for speech reconstruction and speech recognition applications , 2003, INTERSPEECH.

[24]  Xu Shao,et al.  Transform-based feature vector compression for distributed speech recognition , 2002, INTERSPEECH.

[25]  S. Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[26]  Stephanie Seneff,et al.  A computational model for the peripheral auditory system: Application of speech recognition research , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Thomas F. Quatieri,et al.  Pitch estimation and voicing detection based on a sinusoidal speech model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[28]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[29]  Chao Huang,et al.  Large vocabulary Mandarin speech recognition with different approaches in modeling tones , 2000, INTERSPEECH.

[30]  C. W. Therrien,et al.  Decision, Estimation and Classification: An Introduction to Pattern Recognition and Related Topics , 1989 .

[31]  Harald Singer,et al.  Pitch dependent phone modelling for HMM based speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Ben P. Milner,et al.  Robust speech recognition over mobile and IP networks in burst-like packet loss , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  R. J. McAulay,et al.  Speech transformations based on a sinusoidal representation , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Hong Kook Kim,et al.  A bitstream-based front-end for wireless speech recognition on IS-136 communications system , 2001, IEEE Trans. Speech Audio Process..

[35]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[36]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[37]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[38]  Yannis Stylianou,et al.  Stochastic modeling of spectral adjustment for high quality pitch modification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[39]  Philippe Martin Comparison of pitch detection by cepstrum and spectral comb analysis , 1982, ICASSP.

[40]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[41]  Meir Tzur,et al.  Efficient periodicity extraction based on sine-wave representation and its application to pitch determination of speech signals , 2001, INTERSPEECH.

[42]  J P Martens,et al.  Pitch and voiced/unvoiced determination with an auditory model. , 1992, The Journal of the Acoustical Society of America.

[43]  Richard F. Lyon,et al.  A perceptual pitch detector , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[44]  William E Brownell HOW THE EAR WORKS - NATURE'S SOLUTIONS FOR LISTENING. , 1997, The Volta review.

[45]  Stephan Euler,et al.  The influence of speech coding algorithms on automatic speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  Kuldip K. Paliwal,et al.  Effect of speech coders on speech recognition performance , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[47]  Abeer Alwan,et al.  Modeling auditory perception for robust speech recognition , 1998 .

[48]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[49]  Charles W. Therrien,et al.  Discrete Random Signals and Statistical Signal Processing , 1992 .

[50]  B. Sankur,et al.  Applications of Walsh and related functions , 1986 .

[51]  Richard M. Stern,et al.  Speech recognition from GSM codec parameters , 1998, ICSLP.

[52]  Qin Yan,et al.  Predicting formant frequencies from MFCC vectors [speech recognition applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[53]  Xu Shao,et al.  MAP prediction of pitch from MFCC vectors for speech reconstruction , 2004, INTERSPEECH.

[54]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[55]  Jonathan Darch,et al.  Formant prediction from MFCC vectors , 2004 .

[56]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[57]  Vassilios Digalakis,et al.  Quantization of cepstral parameters for speech recognition over the World Wide Web , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[58]  Bhiksha Raj,et al.  Distributed speech recognition with codec parameters , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[59]  Xu Shao,et al.  Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model , 2002, INTERSPEECH.

[60]  Thomas F. Quatieri,et al.  Mixed-phase deconvolution of speech based on a sine-wave model , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[61]  Saeed Vaseghi,et al.  Noise compensation methods for hidden Markov model speech recognition in adverse environments , 1997, IEEE Trans. Speech Audio Process..

[62]  Thomas F. Quatieri,et al.  Phase coherence in speech reconstruction for enhancement and coding applications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[63]  Ann K. Syrdal,et al.  Vowel F1 as a function of speaker fundamental frequency , 1985 .

[64]  Piero Cosi,et al.  Auditory modeling techniques for robust pitch extraction and noise reduction , 1998, ICSLP.

[65]  Andreas Spanias,et al.  Speech coding: a tutorial review , 1994, Proc. IEEE.

[66]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..