A Graph-based Approach to Text Genre Analysis

Genre characterization can be achieved by a variety of methods that employ lexical, syntactic, and presentation features of text to highlight key domain differences and stylistic preferences. However, these traditional methods cannot uncover some important macro-structural features that are embedded in text. Representation of text as a word graph can enable effective frameworks for analysis and identification of key topological features that characterize genres of text. In this study, we investigated graph features such as clustering coefficients, centralization, diameter, and average path lengths for eight text genres. The findings indicated key patterns that vary from a genre to another according to the stylistic differences in text. Furthermore, evidence of subgenres was found through some graph features such as number of connected components and node heterogeneity.

[1]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[2]  Hocine Cherifi,et al.  Comparative evaluation of community detection algorithms: a topological approach , 2012, ArXiv.

[3]  Tony Dudley-Evans Genre analysis: a key to a theory of ESP? , 2000 .

[4]  Lucas Antiqueira,et al.  COMPLEX NETWORKS ANALYSIS OF MANUAL AND MACHINE TRANSLATIONS , 2008 .

[5]  Sung-Hyon Myaeng,et al.  Text genre classification with genre-revealing and subject-revealing features , 2002, SIGIR '02.

[6]  Fidel Ramírez,et al.  Computing topological parameters of biological networks , 2008, Bioinform..

[7]  Susan Thompson,et al.  Frameworks and contexts: A genre-based approach to analysing lecture introductions , 1994 .

[8]  Maite Taboada,et al.  Genre-Based Paragraph Classification for Sentiment Analysis , 2009, SIGDIAL Conference.

[9]  Gerhard Weikum,et al.  Graph-based text classification: learn from your neighbors , 2006, SIGIR.

[10]  Michael Kohl,et al.  Cytoscape: software for visualization and analysis of biological networks. , 2011, Methods in molecular biology.

[11]  Marina Santini,et al.  Characterizing Genres of Web Pages: Genre Hybridism and Individualization , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[12]  Bonnie L. Webber,et al.  Genre distinctions for discourse in the Penn TreeBank , 2009, ACL.

[13]  Bonnie L. Webber,et al.  Squibs: Stable Classification of Text Genres , 2011, CL.

[14]  Chih-Hua Kuo,et al.  A Corpus-Based Approach to Online Materials Development for Writing Research Articles , 2011 .

[15]  Ahmed Ragab Nabhan,et al.  Graph pattern mining techniques to identify potential model organisms , 2014 .

[16]  Gil-Chang Kim,et al.  Automatic Genre Detection of Web Documents , 2004, IJCNLP.

[17]  Mark Stevenson,et al.  The Reuters Corpus Volume 1 -from Yesterday’s News to Tomorrow’s Language Resources , 2002, LREC.

[18]  B. Rutherford Genre Analysis of Corporate Annual Report Narratives , 2005 .