ScaleText: The Design of a Scalable, Adaptable andUser-Friendly Document System for Similarity Searches : Diggingfor Nuggets of Wisdom in Text
暂无分享,去创建一个
This paper describes the design of a new ScaleText system aimed
at scalable semantic indexing of heterogeneous textual corpora.
We discuss the design decisions that lead to a modular system
architecture for indexing and searching using semantic vectors
of document segments – nuggets of wisdom. The prototype system
implementation is evaluated by applying Latent Semantic
Indexing (LSI) on the Enron corpus. And the Bpref measure is
used to automate comparing the performance of different
algorithms and system configurations.
[1] David M. Blei,et al. Probabilistic topic models , 2012, Commun. ACM.
[2] Qiang Huang,et al. An Effective Approach to Verbose Queries Using a Limited Dependencies Language Model , 2009, ICTIR.
[3] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..