Benchmarking High Performance Architectures with Natural Language Processing Algorithms

Natural Language Processing algorithms are resource demanding, especially when tuning toinflective language like Polish is needed. The paper presents time and memory requirementsof part of speech tagging and clustering algorithms applied to two corpora of the Polishlanguage. The algorithms are benchmarked on three high performance platforms of differentarchitectures. Additionally sequential versions and OpenMP implementations of clusteringalgorithms were compared.

[1]  Walter Daelemans,et al.  Improving Accuracy in word class tagging through the Combination of Machine Learning Systems , 2001, CL.

[2]  Michal Wrzeszcz,et al.  Application of Stacked Methods to Part-of-Speech Tagging of Polish , 2009, PPAM.

[3]  FayyadUsama,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005 .

[4]  Maciej Piasecki,et al.  Experiments in Clustering Documents for Automatic Acquisition of Lexical Semantic Networks for Polish , 2008 .

[5]  Jacek Kitowski,et al.  Clustering Polish Texts with Latent Semantic Analysis , 2010, ICAISC.

[6]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[7]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[8]  Michal Wrzeszcz,et al.  Application of Weighted Voting Taggers to Languages Described with Large Tagsets , 2010, Comput. Informatics.

[9]  Jacek Kitowski,et al.  A Case Study of Algorithms for Morphosyntactic Tagging of Polish Language , 2007, Comput. Informatics.

[10]  Zoran Budimac,et al.  Text Categorization and Sorting of Web Search Results , 2009, Comput. Informatics.

[11]  Adam Przepiórkowski,et al.  Information Extraction for Polish Using the SProUT Platform , 2004, Intelligent Information Systems.

[12]  Maciej Piasecki,et al.  Experiments in Documents Clustering for the Automatic Acquisition of Lexical Semantic Networks for Polish , 2008 .

[13]  Jacek Kitowski,et al.  Increasing Quality of the Corpus of Frequency Dictionary of Contemporary Polish for Morphosyntactic Tagging of the Polish Language , 2009, Comput. Informatics.