Latent Semantic Analysis for German Literature Investigation

The paper presents the results of experiments of usage of LSA for analysis of textual data. The method is explained in brief and special attention is pointed on its potential for comparison and investigation of German literature texts. Two hypotheses are tested: 1) the texts by the same author are alike and can be distinguished from the ones by different person; 2) the prose and poetry can be automatically discovered.

[1]  S. T. Dumais,et al.  Human factors and behavioral science: Statistical semantics: Analysis of the potential performance of key-word information systems , 1983, The Bell System Technical Journal.

[2]  Preslav Nakov,et al.  Towards Deeper Understanding of the LSA Performance , 2003 .

[3]  Preslav Nakov,et al.  ArtsSemNet : From Bilingual Dictionary to Bilingual Semantic Network , 2003 .

[4]  Robert M. Losee,et al.  Text Windows and Phrases Differing by Discipline, Location in Document, and Syntactic Structure , 1996, Inf. Process. Manag..

[5]  George R. Klare,et al.  The measurement of readability , 1963 .

[6]  D. Biber A typology of English texts , 1989 .

[7]  J. Schilperoord,et al.  Linguistics , 1999 .

[8]  Preslav Nakov,et al.  Category-based Pseudowords , 2003, HLT-NAACL.

[9]  Preslav Nakov Ending-Guessing Rules for Morphological Classification of German Nouns , 2002 .

[10]  Preslav Nakov,et al.  Term and Document from the Point of View of the Latent Semantic Analysis , 2001 .

[11]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI) and TREC-2 , 1993, TREC.

[12]  Preslav Nakov,et al.  Automatic hyponymy extraction from Bulgarian and Russian terminological dictionaries , 2001 .

[13]  Preslav Nakov,et al.  ИЗСЛЕДВАНЕ НА РУСКА ЛИТЕРАТУРА С ЛАТЕНТЕН СЕМАНТИЧЕН АНАЛИЗ Преслав И. Наков Софийски университет "Св. Климент Охридски" LATENT SEMANTIC ANALYSIS FOR RUSSIAN LITERATURE INVESTIGATION , 2001 .

[14]  Preslav Nakov,et al.  MorphoClass - Recognition and Morphological Classification of Unknown Words for German , 2002, SAAKM@ECAI.

[15]  Preslav Nakov Web Personalization Using Extended Boolean Operations with Latent Semantic Indexing , 2000, AIMSA.

[16]  Jingqian Jiang,et al.  Using Latent Semantic Indexing for Data Mining , 1997 .

[17]  Peter Bock,et al.  A Preliminary Statistical Investigation into the Impace of an N-Gram Analysis Approach Based on World Syntactic Categories Toward Text Author Classification , 2000 .

[18]  Preslav Nakov Latent semantic analysis of textual data , 2000, CompSysTech '00.

[19]  Preslav Nakov,et al.  Investigating the Degree of Adequacy of the Relations in the Concept Structure of Students using the Method of Latent Semantic Analysis , 2001 .

[20]  Susan T. Dumais,et al.  LSI meets TREC: A Status Report , 1992, TREC.

[21]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[22]  Towards deeper understanding of the latent semantic analysis performance , 2003, RANLP.

[23]  Preslav Nakov,et al.  Adaptivity in Web-Based CALL , 2002, ECAI.

[24]  Preslav Nakov,et al.  EXTENDED BOOLEAN OPERATIONS IN LATENT SEMANTIC INDEXING SEARCH , 2002 .

[25]  Preslav Nakov,et al.  Guessing morphological classes of unknown German nouns , 2003, RANLP.

[26]  Susan T. Dumais,et al.  Statistical semantics: analysis of the potential performance of keyword information systems , 1984 .

[27]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[28]  Preslav Nakov,et al.  Weight functions impact on LSA performance , 2001 .

[29]  Jussi Karlgren,et al.  Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[30]  Preslav Nakov,et al.  BioText Team Report for the TREC 2003 Genomics Track , 2003, TREC.

[31]  Preslav Nakov,et al.  Arts-SemNet: A Bilingual Semantic Network for Bulgarian and Russian Fine Arts Terminology , 2003 .

[32]  Donna K. Harman,et al.  How effective is suffixing? , 1991, J. Am. Soc. Inf. Sci..

[33]  Preslav Nakov,et al.  The Impact of the Segmentation on the Automatic Hyponyms Extraction from Terminological Dictionaries , 2001 .

[34]  Latent Semantic Analysis for Notional Structures Investigation , 2002 .

[35]  Preslav Nakov,et al.  The architecture of corporate information and news engine , 2003, CompSysTech '03.

[36]  Preslav Nakov,et al.  Building an inflectional stemmer for Bulgarian , 2003, CompSysTech '03.

[37]  Susan T. Dumais,et al.  Using LSI for information filtering: TREC-3 experiments , 1995 .

[38]  Preslav Nakov BulStem: Design and Evaluation of Inflectional Stemmer for Bulgarian , 1998 .

[39]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[40]  Michael W. Berry,et al.  SVDPACKC (Version 1.0) User''s Guide , 1993 .