Keyword extraction by nonextensivity measure.

The presence of a long-range correlation in the spatial distribution of a relevant word type, in spite of random occurrences of an irrelevant word type, is an important feature of human-written texts. We classify the correlation between the occurrences of words by nonextensive statistical mechanics for the word-ranking process. In particular, we look at the nonextensivity parameter as an alternative metric to measure the spatial correlation in the text, from which the words may be ranked in terms of this measure. Finally, we compare different methods for keyword extraction.

[1]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[2]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[3]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[4]  Pedro Carpena,et al.  Keyword detection in natural languages and DNA , 2002 .

[5]  G. Slater,et al.  A metric to search for relevant words , 2003 .

[6]  S. Abe,et al.  Itineration of the Internet over nonequilibrium stationary states in Tsallis statistics. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[8]  Constantino Tsallis,et al.  Nonextensive statistical mechanics: A brief introduction , 2004 .

[9]  Constantino Tsallis,et al.  Extensivity and entropy production , 2005 .

[10]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[11]  Rada Mihalcea,et al.  Random Walks on Text Structures , 2006, CICLing.

[12]  Pedro A. Pury,et al.  Statistical keyword detection in literary corpora , 2007, ArXiv.

[13]  Amir H. Darooneh,et al.  Analysis of the spatial and temporal distributions between successive earthquakes: Nonextensive statistical mechanics viewpoint , 2008 .

[14]  P. Carpena,et al.  Level statistics of words: finding keywords in literary texts and symbolic sequences. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  C. Tsallis Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World , 2009 .

[16]  Michael W. Berry,et al.  Text mining : applications and theory , 2010 .