Speech Recognition Using Scaly Neural Networks

This research work is aimed at speech recognition using scaly neural networks. A small vocabulary of 11 words were established first, these words are “word, file, open, print, exit, edit, cut, copy, paste, doc1, doc2”. These chosen words involved with executing some computer functions such as opening a file, print certain text document, cutting, copying, pasting, editing and exit. It introduced to the computer then subjected to feature extraction process using LPC (linear prediction coefficients). These features are used as input to an artificial neural network in speaker dependent mode. Half of the words are used for training the artificial neural network and the other half are used for testing the system; those are used for information retrieval. The system components are consist of three parts, speech processing and feature extraction, training and testing by using neural networks and information retrieval. The retrieve process proved to be 79.5-88% successful, which is quite acceptable, considering the variation to surrounding, state of the person, and the microphone type. Keywords—Feature extraction, Liner prediction coefficients, neural network, Speech Recognition, Scaly ANN. I. GENERAL DESCRIPTION PEECH conveys information, and what we are concerned with in computer speech processing is the transmission and reception of that information. This is not as simple as it might seem, because speech convey at least three different kinds of information simultaneously. The most important of these is what we might call linguistic information. This is the kind of information that is generally regarded as the meaning of an utterance. With the growth in the use of digital computers, the prospect of using speech as an input to a computer for entering data, retrieving information, or for transmitting commands led to renewed interest in the speech field [7]. II. SPEECH RECOGNITION Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final result, as for applications such as command & control, data entry, and A. M. Othman was with the SE Department, Faculty of Information Technology, ASU/Jordan, He is now with MIS Department , the Amman Arab University for Graduate Studies, AAUJORDAN(corresponding phone: +962-796184806; fax: 962-6-55 16103; e-mail: akram.othman@aau.edu.jo). M. H. Riadh., was with the Informatics institute for postgraduate studies, Baghdad-Iraq, She is now with IT department, Al-Hussein Bin-Talal University, Ma'an, Jordan, (corresponding phone: +962-799890287; fax: +962-3-2179050; e-mail: mayhr60@yahoo.com, may.riadh@ahu.edu.jo). document preparation or retrieval [17]. The basic assumption of the whole word pattern matching approach is that different utterance of the same word by a particular talker result in similar patterns of sound. There will be variation in spectrum shape at corresponding parts of the patterns from the same word. There will also be variations in the time scale of the patterns, and this will make it difficult to compare corresponding parts [10]. III. SPEECH RECOGNITION SYSTEM Speech recognition is, in its most general form, a conversion from an acoustic waveform to a written equivalent of the message information. Fig. 1 shows a basic speech recognition system [13]. S(t) Speech Feature Feature Vector Fig. 1 The basic speech recognition system IV. SPEECH SIGNAL PROCESSING AND FEATURE EXTRACTION Speech signal processing and feature extraction is the initial stage of any speech recognition system, it is through this component that the system views the speech signal itself.” Speech signal processing” refers to the operations we perform on the speech signal (e.g., filtering, digitization, spectral analysis, etc.) “Feature extraction “is a pattern recognition term that refers to the characterizing measurements that are performed on a pattern (or signal). These features form the input to the classifier that recognizes the pattern [10]. A. Sampling and Quantizing Continuous Speech The acoustic speech signal exists as pressure variations in the air. A microphone converts these pressure variations into an electric current that is related to the pressure (similarly the ear converts the pressure variations into a serious of nerve impulses that are transmitted to the brain). To process the speech signal digitally, it is necessary to make the analog waveform discrete in both time (sample) and amplitude (quantize) [2]. The general nature of digital speech waveform representations is depicted in Fig. 2 [13]. Akram M. Othman, and May H. Riadh Speech Recognition Using Scaly Neural Networks S Speech signal Preprocessing Feature Extraction Classification World Academy of Science, Engineering and Technology International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering Vol:2, No:2, 2008 211 International Scholarly and Scientific Research & Innovation 2(2) 2008 scholar.waset.org/1999.5/7057 In te rn at io na l S ci en ce I nd ex V ol :2 , N o: 2, 2 00 8 w as et .o rg /P ub lic at io n/ 70 57 xa(t) x(n)= xa(nT) x^(n) Continuous-time sequence of speech signal finite-precision samples Fig. 2 General natures of digital speech waveform representations