A New Subject-Based Document Retrieval from Digital Libraries Using Vector Space Model

Document retrieval from digital libraries based on user's query is highly affected by the terms appeared in the query. In many cases, there are some documents in the digital libraries that do not share exactly the same terms with the query, but they are related to the user's need. We address this challenge in this paper by introducing a new subject-based retrieval approach in which, apart from ranking documents based on the terms in the query, a new subject-based scoring scheme is defined between the query and a document. We define this score by introducing a new vector space model in which a vectorized subject-based representation is defined for each document and its keywords, and the terms in the query, as well. We have tested the new subject-based scoring scheme on a database of scientific papers obtained from Web of Science. Our Experimental results show that in 83% of times users prefer the proposed scoring scheme with respect to the classic scoring ones.

[1]  Milos Hauskrecht,et al.  Document Retrieval using a Probabilistic Knowledge Model , 2009, KDIR.

[2]  Uma Shanker Tiwary,et al.  A Hybrid Model to Improve Relevance in Document Retrieval , 2006, J. Digit. Inf. Manag..

[3]  Matthew Lease,et al.  Effective Term Weighting for Sentence Retrieval , 2010, ECDL.

[4]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[5]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[6]  Luis M. de Campos,et al.  An information retrieval model based on simple Bayesian networks , 2003, Int. J. Intell. Syst..

[7]  Wafa Maitah,et al.  IMPROVING THE EFFECTIVENESS OF INFORMATION RETRIEVAL SYSTEM USING ADAPTIVE GENETIC ALGORITHM , 2013 .

[8]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[9]  K. Kita,et al.  Improvement of vector space information retrieval model based on supervised learning , 2000, IRAL '00.

[10]  Yasemin Kural,et al.  Clustering Information Retrieval Search Outputs , 1999, BCS-IRSG Annual Colloquium on IR Research.

[11]  Bhaskar Mitra,et al.  Improving Document Ranking with Dual Word Embeddings , 2016, WWW.

[12]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[13]  Azadeh Mohebi,et al.  Subject-based retrieval of scientific documents, case study , 2017 .

[14]  Luis M. de Campos,et al.  A Layered Bayesian Network Model for Document Retrieval , 2002, ECIR.

[15]  Jian Zhang,et al.  Improving the Effectiveness of Information Retrieval with Clustering and Fusion , 2001, Int. J. Comput. Linguistics Chin. Lang. Process..