Generalized Representation of Syntactic Structures

Analysis of language provides important insights into the underlying psychological properties of individuals and groups. While the majority of language analysis work in psychology has focused on semantics, psychological information is encoded not just in what people say, but how they say it. In the current work, we propose Conversation Level Syntax Similarity Metric-Group Representations (CASSIM-GR). This tool builds generalized representations of syntactic structures of documents, thus allowing researchers to distinguish between people and groups based on syntactic differences. CASSIMGR builds off of Conversation Level Syntax Similarity Metric by applying spectral clustering to syntactic similarity matrices and calculating the center of each cluster of documents. This resulting cluster centroid then represents the syntactical structure of the group of documents. To examine the effectiveness of CASSIM-GR, we conduct three experiments across three unique corpora. In each experiment, we calculate the clustering accuracy and compare our proposed technique to a bagof-words approach. Our results provide evidence for the effectiveness of CASSIM-GR and demonstrate that combining syntactic similarity and tf-idf semantic information improves the total accuracy of group classification.

[1]  E. Feigenbaum The simulation of verbal learning behavior , 1899, IRE-AIEE-ACM '61 (Western).

[2]  E. A. Feigenbaum,et al.  The simulation of verbal learning behavior , 1899, IRE-AIEE-ACM '61 (Western).

[3]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[4]  Jane C. Hill A computational model of language acquisition in the two-years-old , 1982 .

[5]  P. Langley,et al.  Identifying Solution Paths in Cognitive Diagnosis. , 1985 .

[6]  W. Bruce Croft,et al.  Term clustering of syntactic phrases , 1989, SIGIR '90.

[7]  P. Langley,et al.  Computational Models of Scientific Discovery and Theory Formation , 1990 .

[8]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[9]  E. H. Jahr Middle-aged male syntax , 1992 .

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  G. Vigliocco,et al.  When Sex and Syntax Go Hand in Hand: Gender Agreement in Language Production , 1999 .

[12]  Wei-Ying Ma,et al.  An Evaluation on Feature Selection for Text Clustering , 2003, ICML.

[13]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[14]  Dimitar Kazakov,et al.  WordNet-based text document clustering , 2004 .

[15]  Jianchu Kang,et al.  A comparative study on unsupervised feature selection methods for text clustering , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[16]  Minoru Sasaki,et al.  Spam detection using text clustering , 2005, 2005 International Conference on Cyberworlds (CW'05).

[17]  A. Maass,et al.  Do verbs and adjectives play different roles in different cultures? A cross-linguistic analysis of person representation. , 2006, Journal of personality and social psychology.

[18]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[19]  J. Bresnan,et al.  Gradient Grammar: An Effect of Animacy on the Syntax of give in New Zealand and American English , 2008 .

[20]  James W. Pennebaker,et al.  The Psychology of Word Use in Depression Forums in English and in Spanish: Texting Two Text Analytic Approaches , 2008, ICWSM.

[21]  Hong-Gee Kim,et al.  Exploiting noun phrases and semantic relationships for text document clustering , 2009, Inf. Sci..

[22]  Brian A. Nosek,et al.  Liberals and conservatives rely on different sets of moral foundations. , 2009, Journal of personality and social psychology.

[23]  Wei Song,et al.  Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures , 2009, Expert Syst. Appl..

[24]  Feature Selection and Feature Extraction Approaches to , 2009 .

[25]  B. Gawda Syntax of Emotional Narratives of Persons Diagnosed with Antisocial Personality , 2010, Journal of psycholinguistic research.

[26]  Xiaofei Lu,et al.  Automatic analysis of syntactic complexity in second language writing , 2010 .

[27]  M. Mehl,et al.  How Taking a Word for a Word Can Be Problematic: Context-Dependent Linguistic Markers of Extraversion and Neuroticism , 2013 .

[28]  D. Medin,et al.  Epistemologies in the Text of Children's Books: Native- and non-Native-authored books , 2013 .

[29]  Shie-Jue Lee,et al.  A Similarity Measure for Text Classification and Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[30]  Morteza Dehghani,et al.  Analyzing Political Rhetoric in Conservative and Liberal Weblogs Related to the Construction of the “Ground Zero Mosque” , 2014 .

[31]  Scott A. Crossley,et al.  Automatically Assessing Lexical Sophistication: Indices, Tools, Findings, and Application , 2015 .

[32]  Eric Horvitz,et al.  Identifying Dogmatism in Social Media: Signals and Models , 2016, EMNLP.

[33]  Kate M. Johnson,et al.  Purity homophily in social networks. , 2016, Journal of experimental psychology. General.

[34]  Morteza Dehghani,et al.  Conversation level syntax similarity metric , 2017, Behavior Research Methods.