Neurolinguistic approach to natural language processing with applications to medical text analysis

Understanding written or spoken language presumably involves spreading neural activation in the brain. This process may be approximated by spreading activation in semantic networks, providing enhanced representations that involve concepts not found directly in the text. The approximation of this process is of great practical and theoretical interest. Although activations of neural circuits involved in representation of words rapidly change in time snapshots of these activations spreading through associative networks may be captured in a vector model. Concepts of similar type activate larger clusters of neurons, priming areas in the left and right hemisphere. Analysis of recent brain imaging experiments shows the importance of the right hemisphere non-verbal clusterization. Medical ontologies enable development of a large-scale practical algorithm to re-create pathways of spreading neural activations. First concepts of specific semantic type are identified in the text, and then all related concepts of the same type are added to the text, providing expanded representations. To avoid rapid growth of the extended feature space after each step only the most useful features that increase document clusterization are retained. Short hospital discharge summaries are used to illustrate how this process works on a real, very noisy data. Expanded texts show significantly improved clustering and may be classified with much higher accuracy. Although better approximations to the spreading of neural activations may be devised a practical approach presented in this paper helps to discover pathways used by the brain to process specific concepts, and may be used in large-scale applications.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[3]  Risto Miikkulainen,et al.  Subsymbolic natural language processing - an integrated model of scripts, lexicon, and memory , 1993, Neural network modeling and connectionism.

[4]  Kazem Taghva,et al.  A method for calculating term similarity on large document collections , 2003, Proceedings ITCC 2003. International Conference on Information Technology: Coding and Computing.

[5]  Friedemann Pulvermüller,et al.  The Neuroscience of Language: On Brain Circuits of Words and Serial Order , 2003 .

[6]  Gerrit Antonides Evaluation and Applications , 1990 .

[7]  Wodzisaw Duch,et al.  THE SEPARABILITY OF SPLIT VALUE CRITERION , 2000 .

[8]  James L. McClelland Explorations In Parallel Distributed Processing , 1988 .

[9]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[10]  Wlodzislaw Duch,et al.  Nonambiguous Concept Mapping in Medical Domain , 2006, ICAISC.

[11]  Osmar R. Zaïane,et al.  Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Wlodzislaw Duch,et al.  Feature space mapping as a universal adaptive system , 1995 .

[13]  Wlodzislaw Duch Filter methods , 2004 .

[14]  A. Gruszka,et al.  Priming and Acceptance of Close and Remote Associations by Creative and Less Creative People , 2002 .

[15]  M. Just,et al.  From the Selectedworks of Marcel Adam Just the Organization of Thinking: What Functional Brain Imaging Reveals about the Neuroarchitecture of Complex Cognition , 2022 .

[16]  Wlodzislaw Duch,et al.  Influence of a priori Knowledge on Medical Document Categorization , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[17]  Risto Miikkulainen,et al.  Text and Discourse Understanding: The DISCERN System , 2002 .

[18]  Michalis Vazirgiannis,et al.  Word Sense Disambiguation with Spreading Activation Networks Generated from Thesauri , 2007, IJCAI.

[19]  Wlodzislaw Duch Creativity and the Brain , 2007 .

[20]  John F. Sowa,et al.  Principles of semantic networks , 1991 .

[21]  M. Sigman,et al.  Opinion TRENDS in Cognitive Sciences Vol.9 No.7 July 2005 The neural code for written words: a proposal , 2022 .

[22]  Joe Z. Tsien,et al.  The organizing principles of real-time memory encoding: Neural clique assemblies and universal neural codes , 2007, Neuroscience Research.

[23]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[24]  Maciej Pilichowski,et al.  Experiments with Computational Creativity , 2007 .

[25]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[26]  James L. McClelland,et al.  Psychological and biological models , 1986 .

[27]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[28]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[29]  Edward M. Bowden,et al.  New approaches to demystifying insight , 2005, Trends in Cognitive Sciences.

[30]  James L. McClelland,et al.  Parallel Distributed Processing: Explorations in the Microstructure of Cognition : Psychological and Biological Models , 1986 .

[31]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[32]  S. Lamb Pathways of the brain , 1999 .

[33]  F. Lehmann,et al.  Semantic Networks in Artificial Intelligence , 1992 .

[34]  Wlodzislaw Duch Brain-Inspired Conscious Computing Architecture , 2005 .

[35]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[36]  S. Dehaene,et al.  Direct Intracranial, fMRI, and Lesion Evidence for the Causal Role of Left Inferotemporal Cortex in Reading , 2006, Neuron.

[37]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[38]  S. Mednick The associative basis of the creative process. , 1962, Psychological review.

[39]  Fabio Crestani,et al.  Application of Spreading Activation Techniques in Information Retrieval , 1997, Artificial Intelligence Review.

[40]  Wendy A. Suzuki,et al.  Encoding New Episodes and Making Them Stick , 2006, Neuron.

[41]  F. Pulvermüller The Neuroscience of Language , 2003 .

[42]  O. Silva Pathways of the brain: The neurocognitive basis of language , 2000 .

[43]  Fabio Crestani,et al.  Searching the web by constrained spreading activation , 2000, Inf. Process. Manag..

[44]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[45]  S. Dehaene,et al.  Towards a cognitive neuroscience of consciousness: basic evidence and a workspace framework , 2001, Cognition.

[46]  Edward M. Bowden,et al.  Neural Activity When People Solve Verbal Problems with Insight , 2004, PLoS biology.