Rapid Increase of the Weighted Shortest Path Length in Key Term Concurrence Network and Its Origin

In previous work, we constructed a Key Term Concurrence Network (KTCN) based on large-scale corpus with an attempt to apply weighted shortest path length to measure semantic relevance between terms. The parameter was tentatively used for query expansion in Information Retrieval task directed to complex user query expressed in natural language. The data obtained from the experiment demonstrated improved performance in the task. However, we also found that as more new expanded terms are appended to the vector of original query, the performance decreases drastically after reaching a peak. This paper respectively explains the causes of this phenomenon from two perspectives: the property of complex network property and corpus linguistics. Based on this conclusion, future work is directed towards how to improve our work.

[1]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[2]  Lucas Antiqueira,et al.  Strong correlations between text quality and complex networks features , 2007 .

[3]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[4]  Noriko Kando,et al.  Overview of the NTCIR-7 ACLIA IR4QA Task , 2008, NTCIR.

[5]  Oren Kurland,et al.  PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.

[6]  Noriko Kando Overview of the Seventh NTCIR Workshop , 2008, NTCIR.

[7]  Dai Guo-zhong Topic Analysis of Chinese Text Based on Small World Model , 2007 .

[8]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[9]  Dong-Hong Ji,et al.  Chinese Information Retrieval Based on Terms and Ontology , 2004, NTCIR.

[10]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[11]  Reinhard Köhler,et al.  Patterns in syntactic dependency networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[13]  S N Dorogovtsev,et al.  Language as an evolving word web , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[14]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[15]  Rada Mihalcea,et al.  Language Independent Extractive Summarization , 2005, ACL.

[16]  T.A.S. Pardo,et al.  Using Complex Networks for Language Processing: The Case of Summary Evaluation , 2006, 2006 International Conference on Communications, Circuits and Systems.

[17]  Jenefer Robinson A Sentimental Education , 2005 .