论文信息 - From Vector Space Models to Vector Space Models of Semantics

From Vector Space Models to Vector Space Models of Semantics

This paper assesses the performance of frequency and concept based text representation in Mixed Script Information Retrieval and Classification tasks. In text analytics, representation serves as an unresolved research problem to progress further towards different applications. In this paper observations from different text representation methods in text classification and information retrieval are presented. The data set from the Mixed Script Information Retrieval shared task is used in this experiment and the performance of final submitted model is evaluated by task organizers. It is observed that distributional representation performs better than the frequency based text representation methods. The final system attained first place in task 2 and was 3.89% lesser than the top scored system in task 1.

[1] Sung-Hyuk Cha. Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[2] K. P. Soman,et al. Amrita_CEN at SemEval-2016 Task 1: Semantic Relation from Word Embeddings in Higher Dimension , 2016, SemEval@NAACL-HLT.

[3] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[4] Vinay Chavan,et al. A VECTOR SPACE MODEL FOR INFORMATION RETRIEVAL: A MATLAB APPROACH , 2012 .

[5] P SomanK.,et al. Distributional Semantic Representation in Health Care Text Classification , 2016, FIRE.

[6] Jeffrey Pennington,et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[7] Somnath Banerjee,et al. The First Cross-Script Code-Mixed Question Answering Corpus , 2016, MultiLingMine@ECIR.

[8] M. Anand Kumar,et al. Author identification based on word distribution in word space , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[9] Xin Liu,et al. Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[10] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11] Mirella Lapata,et al. A Comparison of Vector-based Representations for Semantic Composition , 2012, EMNLP.

[12] Peter W. Foltz,et al. An introduction to latent semantic analysis , 1998 .