Using shallow semantic analysis and graph modelling for document classification

Using graph-based, shallow semantic analysis-driven approach for modelling text contents allow to extract additional information about meaning of text. This paper discusses using two novel algorithms that are based on this idea. They are compared against ‘legacy’ bag-of-words and Schenker et al. approaches in NN document classification task.

[1]  H. Gleitman,et al.  Mother, Id rather do it myself: Some effects and non-effects of maternal speech style , 1977 .

[2]  Lars Borin,et al.  Through a glass darkly: Part-of-speech distribution in original and translated text , 2000, CLIN.

[3]  Takenobu Tokunaga,et al.  The Use of WordNet in Information Retrieval , 1998, WordNet@ACL/COLING.

[4]  Abraham Kandel,et al.  The hybrid representation model for web document classification , 2008 .

[5]  Abraham Kandel,et al.  Classification of Web documents using a graph model , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[6]  Dawid Weiss,et al.  Exploring linguistic features for web spam detection: a preliminary study , 2008, AIRWeb '08.

[7]  Elizabeth Bates,et al.  A cross-linguistic study of early lexical development , 1995 .

[8]  Grzegorz Dobrowolski,et al.  Is shallow semantic analysis really that shallow? A study on improving text classification performance , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[9]  Tommy W. S. Chow,et al.  A new document representation using term frequency and vectorized graph connectionists with application to document retrieval , 2009, Expert Syst. Appl..

[10]  Abraham Kandel,et al.  DegExt - A Language-Independent Graph-Based Keyphrase Extractor , 2011, AWIC.

[11]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[12]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[13]  Lila R. Gleitman,et al.  Why It Is Hard to Label Our Concepts. , 2004 .

[14]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[15]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[16]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[17]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[18]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[19]  Kai-Uwe Kühnberger,et al.  Structure-Sensitive Learning of Text Types , 2007, Australian Conference on Artificial Intelligence.

[20]  Horst Bunke,et al.  On Graphs with Unique Node Labels , 2003, GbRPR.

[21]  Frans Coenen,et al.  Text Classification using Graph Mining-based Feature Extraction , 2010, SGAI Conf..