Parameters Driving Effectiveness of LSA on Topic Segmentation

Latent Semantic Analysis (LSA) is an efficient statistical technique for extracting semantic knowledge from large corpora. One of the major problems of this technique is the identification of the most efficient parameters of LSA and the best combination between them. Therefore, in this paper, we propose a new topic segmenter to study in depth the different parameters of LSA for the topic segmentation. Thus, the aim of this study is to analyze the effect of these different parameters on the quality of topic segmentation and to identify the most efficient parameters. Based on extensive experiments, we showed that the choice of LSA parameters is very sensitive and it has an impact on the quality of topic segmentation. More important, according to this study, we are able to propose appropriate recommendation for the selection of parameters in the field of topic segmentation.

[1]  Mathieu Roche,et al.  ExpLSA: An Approach Based on Syntactic Knowledge in Order to Improve LSA for a Conceptual Classification Task , 2008, CICLing 2008.

[2]  Olivier Ferret,et al.  Improving Text Segmentation by Combining Endogenous and Exogenous Methods , 2009, RANLP.

[3]  Virginie Zampa,et al.  PtiClic: A Game for Vocabulary Assessment combining JeuxDeMots and LSA , 2009 .

[4]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[5]  Gustaf Neumann,et al.  Factors Influencing Effectiveness in Automated Essay Scoring with LSA , 2005, AIED.

[6]  Yves Bestgen,et al.  Comment évaluer les algorithmes de segmentation automatique ? Essai de construction d’un matériel de référence. , 2006, JEPTALNRECITAL.

[7]  Danielle S. McNamara,et al.  The Role of Local and Global Weighting in Assessing the Semantic Similarity of Texts Using Latent Semantic Analysis , 2010, FLAIRS.

[8]  Peter M. Wiemer-Hastings,et al.  How Latent is Latent Semantic Analysis? , 1999, IJCAI.

[9]  Joemon M. Jose,et al.  Text segmentation via topic modeling: an analytical study , 2009, CIKM.

[10]  Violaine Prince,et al.  Lexical and Semantic Methods in Inner Text Topic Segmentation: A Comparison between C99 and Transeg , 2008, NLDB.

[11]  Ricardo Olmos,et al.  Latent Semantic Analysis Parameters for Essay Evaluation using Small-Scale Corpora* , 2010, J. Quant. Linguistics.

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  Alexander F. Gelbukh,et al.  Dependency-Based Semantic Parsing for Concept-Level Text Analysis , 2014, CICLing.

[14]  Anja Habacha Chaïbi,et al.  Topic Segmentation for Textual Document Written in Arabic Language , 2014, KES.

[15]  Preslav Nakov,et al.  Towards Deeper Understanding of the LSA Performance , 2003 .

[16]  Preslav Nakov,et al.  Weight functions impact on LSA performance , 2001 .

[17]  Mitchell P. Marcus,et al.  Topic segmentation: algorithms and applications , 1998 .

[18]  Fridolin Wild,et al.  Using Latent-Semantic Analysis and Network Analysis for Monitoring Conceptual Development , 2011, J. Lang. Technol. Comput. Linguistics.

[19]  Johanna D. Moore,et al.  Latent Semantic Analysis for Text Segmentation , 2001, EMNLP.

[20]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[21]  Susan T. Dumais,et al.  Enhancing Performance in Latent Semantic Indexing (LSI) Retrieval , 1990 .

[22]  Yves Bestgen,et al.  Squibs and Discussions: Improving Text Segmentation Using Latent Semantic Analysis: A Reanalysis of Choi, Wiemer-Hastings, and Moore (2001) , 2006, CL.

[23]  Vipul Jain,et al.  A journey from normative to behavioral operations in supply chain management: A review using Latent Semantic Analysis , 2015, Expert Syst. Appl..