Unveiling the relationship between complex networks metrics and word senses

The automatic disambiguation of word senses (i.e., the identification of which of the meanings is used in a given context for a word that has multiple meanings) is essential for such applications as machine translation and information retrieval, and represents a key step for developing the so-called Semantic Web. Humans disambiguate words in a straightforward fashion, but this does not apply to computers. In this paper we address the problem of Word Sense Disambiguation (WSD) by treating texts as complex networks, and show that word senses can be distinguished upon characterizing the local structure around ambiguous words. Our goal was not to obtain the best possible disambiguation system, but we nevertheless found that in half of the cases our approach outperforms traditional shallow methods. We show that the hierarchical connectivity and clustering of words are usually the most relevant features for WSD. The results reported here shed light on the relationship between semantic and structural parameters of complex networks. They also indicate that when combined with traditional techniques the complex network approach may be useful to enhance the discrimination of senses in large texts.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  D. Saad Europhysics Letters , 1997 .

[3]  Luciano da Fontoura Costa,et al.  Comparing intermittency and network measurements of words and their dependence on authorship , 2011, ArXiv.

[4]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5]  Scott E. Page,et al.  Diversity and Complexity , 2010 .

[6]  Dragomir R. Radev,et al.  Graph-based Natural Language Processing and Information Retrieval , 2011 .

[7]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Diana Patterson Proceedings of the 4th annual international conference on Systems documentation , 1986 .

[9]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[10]  Taylor Francis Online,et al.  Ultracold atomic gases in optical lattices: mimicking condensed matter physics and beyond , 2006, cond-mat/0606771.

[11]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[12]  Harald Reuter,et al.  Diversity and complexity , 1988, Nature.

[13]  William E. Moen,et al.  Using Encyclopedic Knowledge for Automatic Topic Identification , 2009, CoNLL.

[14]  W. N. Locke,et al.  Machine Translation of Languages: Fourteen Essays , 1955 .

[15]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[16]  G. Wergen,et al.  Records in stochastic processes—theory and applications , 2012, 1211.6005.

[17]  Luciano da Fontoura Costa,et al.  Using complex networks to quantify consistency in the use of words , 2013, ArXiv.

[18]  Carl Vogel,et al.  Proceedings of the 16th International Conference on Computational Linguistics , 1996, COLING 1996.

[19]  Alessandro Vespignani,et al.  Dynamical Processes on Complex Networks , 2008 .

[20]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[21]  A. Asztalos,et al.  Network discovery by generalized random walks , 2010, 1008.4980.

[22]  Luciano da Fontoura Costa,et al.  Beyond the average: Detecting global singular nodes from local features in complex networks , 2006, 1003.3084.

[23]  John Nerbonne Proceedings of the Conference on Natural Language Learning , 2001, ACL 2001.

[24]  J-P Eckmann,et al.  Hierarchical structures induce long-range dynamical correlations in written texts. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[25]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[26]  John C. Mallery Thinking About Foreign Policy: Finding an Appropriate Role for Artificially Intelligent Computers , 1988 .

[27]  Peter Tino,et al.  IEEE Transactions on Neural Networks , 2009 .

[28]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[29]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[30]  Hwee Tou Ng,et al.  Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2008 .