We present a method for visualizing text corpora that are assumed to contain labeled and unlabeled documents. Our method aims at learning data mappings of labeled documents including the terms that are most relevant for label discrimination. We can use this information to visualize mapped unlabeled documents as well. We also show how this method allows the inclusion of user’s feedback. This feedback is supplied in an iterative process, so that the user can use the output of the method to provide its domain knowledge of the data. At the same time, this technique is well suited for providing a new low-dimensional space where traditional clustering or classification methods can be applied. Even though our approach is able to deal with document labels that are discrete classes, continuous values, or associated vectors, we confine the experiments of this article to labels that represent non-overlapped topics. This approach is evaluated using a set of short and noisy documents, which is considered as a challenging task in the text mining literature.
[1]
Hua Huang,et al.
Manifold Learning for Visualizing and Analyzing High-Dimensional Data
,
2010,
IEEE Intelligent Systems.
[2]
Amir Globerson,et al.
Metric Learning by Collapsing Classes
,
2005,
NIPS.
[3]
Christopher D. Manning,et al.
Introduction to Information Retrieval
,
2010,
J. Assoc. Inf. Sci. Technol..
[4]
Marc Strickert,et al.
Adaptive matrix distances aiming at optimum regression subspaces
,
2010,
ESANN.
[5]
Richard A. Harshman,et al.
Indexing by Latent Semantic Analysis
,
1990,
J. Am. Soc. Inf. Sci..