BorderFlow: A Local Graph Clustering Algorithm for Natural Language Processing

In this paper, we introduce BorderFlow, a novel local graph clustering algorithm, and its application to natural language processing problems. For this purpose, we first present a formal description of the algorithm. Then, we use BorderFlow to cluster large graphs and to extract concepts from word similarity graphs. The clustering of large graphs is carried out on graphs extracted from the Wikipedia Category Graph. The subsequent low-bias extraction of concepts is carried out on two data sets consisting of noisy and clean data. We show that BorderFlow efficiently computes clusters of high quality and purity. Therefore, BorderFlow can be integrated in several other natural language processing applications.

[1]  Stephen E. Robertson,et al.  The TREC 2002 Filtering Track Report , 2002, TREC.

[2]  S. Dongen Graph clustering by flow simulation , 2000 .

[3]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[4]  Iryna Gurevych,et al.  Analysis of the Wikipedia Category Graph for NLP Applications , 2007 .

[5]  Evelyne Tzoukermann,et al.  Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax , 1997, ACL.

[6]  Christian Biemann,et al.  Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .

[7]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[8]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[9]  Hinrich Sch Automatic Word Sense Discrimination , 1998 .

[10]  Axel-Cyrille Ngonga Ngomo,et al.  SIGNUM: A Graph Algorithm for Terminology Extraction , 2008, CICLing.

[11]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[12]  Filippo Menczer,et al.  Dynamic extraction topic descriptors and discriminators: towards automatic context-based topic search , 2004, CIKM '04.

[13]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[14]  Asunción Gómez-Pérez,et al.  Overview and analysis of methodologies for building ontologies , 2002, The Knowledge Engineering Review.

[15]  Christian Wolff,et al.  Learning Relations Using Collocations , 2001, Workshop on Ontology Learning.