Kernel-based machine learning for fast text mining in R

Recent advances in the field of kernel-based machine learning methods allow fast processing of text using string kernels utilizing suffix arrays. kernlab provides both kernel methods' infrastructure and a large collection of already implemented algorithms and includes an implementation of suffix-array-based string kernels. Along with the use of the text mining infrastructure provided by tm these packages provide R with functionality in processing, visualizing and grouping large collections of text data using kernel methods. The emphasis is on the performance of various types of string kernels at these tasks.

[1]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[2]  Bernhard Schölkopf,et al.  Dynamic Alignment Kernels , 2000 .

[3]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[4]  Gesellschaft für Klassifikation. Jahrestagung,et al.  Advances in Data Analysis, Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation e.V., Freie Universität Berlin, March 8-10, 2006 , 2007, GfKl.

[5]  Kurt Hornik,et al.  Text Mining Infrastructure in R , 2008 .

[6]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[7]  David D. Lewis,et al.  Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[8]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[9]  Jason Weston,et al.  Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[10]  C. Watkins Dynamic Alignment Kernels , 1999 .

[11]  Choon Hui Teo,et al.  Fast and space efficient string kernels using suffix arrays , 2006, ICML.

[12]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[13]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[14]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[15]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Boris Ryabko,et al.  Application of information-theoretic tests for the analysis of DNA sequences based on Markov chain models , 2009, Comput. Stat. Data Anal..

[17]  Le Song,et al.  Colored Maximum Variance Unfolding , 2007, NIPS.

[18]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[19]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[20]  Alexandros Karatzoglou,et al.  Text Clustering with String Kernels in R , 2006, GfKl.

[21]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[22]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .