A Text Network Representation Model

Text representation is the basis of text processing. Most current text representation models ignore the words' inter-relations, which result in the loss of textpsilas structure information. This paper proposed a novel text representation model, which uses lexical network to represent the text and retains the text's structure. According to the different levels of words' inter-relations, co-occurrence network, syntactic network and semantic network are introduced. To evaluate the representation ability of text network representation model, we investigated the applications of text network to two language processing tasks including unsupervised keyword extraction and text classification. The experimental results show how to use it for natural language processing successfully.

[1]  Pushpak Bhattacharyya,et al.  Text Clustering using Semantics , 2002 .

[2]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[3]  Cai Qingsheng,et al.  Automatic keywords extraction of Chinese document using small world structure , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[4]  S N Dorogovtsev,et al.  Language as an evolving word web , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[5]  Wenfeng Yang Chinese keyword extraction based on max-duplicated strings of the documents , 2002, SIGIR '02.

[6]  Zheng Chen,et al.  Text representation: from vector to tensor , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Ricard V. Solé,et al.  Language networks: Their structure, function, and evolution , 2007, Complex..

[8]  Rohini K. Srihari,et al.  Graph-based text representation and knowledge discovery , 2007, SAC '07.

[9]  Svetlana Hensman,et al.  Construction of Conceptual Graph Representation of Texts , 2004, NAACL.

[10]  Inderjeet Mani,et al.  Multi-Document Summarization by Graph Search and Matching , 1997, AAAI/IAAI.

[11]  Shuhai Liu,et al.  A comparative study on text representation schemes in text categorization , 2005, Pattern Analysis and Applications.

[12]  Guo Wenzhong Research on Chinese Document Classification Based on Graph Model , 2006 .

[13]  Cong Wang,et al.  Keyword Extraction Based on PageRank , 2007, PAKDD.