Screening and analysis of specific language impairment in young children by analyzing the textures of speech signal

A child having a delayed development in language skills without any reason is known to be suffering from specific language impairment (SLI). Unfortunately, almost 7% kindergarten children are reported with SLI in their childhood. The SLI could be treated if identified at an early stage, but diagnosing SLI at early stage is challenging. In this article, we propose a machine learning based system to screen the SLI speech by analyzing the texture of the speech utterances. The texture of speech signals is extracted from the popular time-frequency representation called spectrograms. These spectrogram acts like a texture image and the textural features to capture the change in audio quality such as Haralick’s feature and local binary patterns (LBPs) are extracted from these textural images. The experiments are performed on 4214 utterances taken from 44 healthy and 54 SLI speakers. Experimental results with 10-fold cross validation, indicates that a very good accuracy up to 97.41% is obtained when only 14 dimensional Haralick’s feature is used. The accuracy is slightly boosted up to 99% when the 59-dimensional LBPs are amalgamated with Haralick’s features. The sensitivity and specificity of the whole system is up to 98.96% and 99.20% respectively. The proposed method is gender and speaker independent and invariant to examination conditions.

[1]  Karen Froud,et al.  Development of the Grammar and Phonology Screening (GAPS) test to assess key markers of specific language and literacy difficulties in young children. , 2006, International journal of language & communication disorders.

[2]  Vijayan K. Asari,et al.  Classification of hyperspectral image using multiscale spatial texture features , 2016, 2016 8th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS).

[3]  Sridhar Krishnan,et al.  Combining Temporal Features by Local Binary Pattern for Acoustic Scene Classification , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Robert L Sainburg,et al.  The neuropathology of developmental dysphasia: Behavioral, morphological, and physiological evidence for a pervasive temporal processing disorder , 1991, Reading and writing.

[5]  Pavel Grill,et al.  Speech Databases of Typical Children and Children with SLI , 2016, PloS one.

[6]  Matti Pietikäinen,et al.  Unsupervised texture segmentation using feature distributions , 1997, Pattern Recognit..

[7]  Liangpei Zhang,et al.  Texture feature fusion for high resolution satellite image classification , 2005, International Conference on Computer Graphics, Imaging and Visualization (CGIV'05).

[8]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[9]  Sridhar Krishnan,et al.  Trends in audio signal feature extraction methods , 2020 .

[10]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[12]  A. Kamhi,et al.  Specific language impairment. , 2013, Handbook of clinical neurology.

[13]  Hichem Frigui,et al.  Dominant Texture Descriptors for image classification and retrieval , 2008, 2008 15th IEEE International Conference on Image Processing.

[14]  R. Schwartz Specific Language Impairment , 2008 .