DSSM with text hashing technique for text document retrieval in next-generation search engine for big data and data analytics

Digital world is coming, were data as become big data with ever increase in large volume of digital information available in terms of text documents. This tends for data extraction, enrichment, analysis and retrieval of text documents which are in the form of unstructured nature becomes a major problem in search engine. Traditionally text documents are the source of storing our information; either personal or professional. Today text documents are generating at very high speed, and need to be process the data on-time to upgrade the search engine. It is also important for organizations including private and public which have been collecting large volume of domain-specific text document information, which may contain national intelligence, education, medical information, business and marketing. In this paper we present a system that enriches the information retrieval process of text documents in search engine from unstructured data and bringing the big data and data analytics world into educational sector and make the best of both worlds by using the latest cutting edge technology deep-structured semantic modeling with text hashing and proposing a next generation search engine.

[1]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[2]  Gonzalo Navarro,et al.  General Document Retrieval in Compact Space , 2015, ACM J. Exp. Algorithmics.

[3]  Víctor Codocedo,et al.  A semantic approach to concept lattice-based information retrieval , 2014, Annals of Mathematics and Artificial Intelligence.

[4]  Yun Hu,et al.  Efficient multi-event monitoring using built-in search engines , 2015, Frontiers of Computer Science.

[5]  Olga Vechtomova Introduction to Information Retrieval Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Cambridge University Press, 2008 , 2009, Comput. Linguistics.

[6]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[7]  Paul Over,et al.  Building Better Search Engines by Measuring Search Quality , 2014, IT Professional.

[8]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[9]  James R. Glass,et al.  Spoken Content Retrieval—Beyond Cascading Speech Recognition with Text Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[11]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[12]  Sudeshna Sarkar,et al.  RI for IR: Capturing Term Contexts Using Random Indexing for Comprehensive Information Retrieval , 2014, MICAI.

[13]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[14]  S. Tarun,et al.  Enabling Time Sensitive Information Retrieval on the Web through Real Time Search Engines Using Streams , 2014 .

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[17]  Hui He,et al.  Exploring large-scale small file storage for search engines , 2015, The Journal of Supercomputing.

[18]  Nicole Bauer,et al.  Information Retrieval Implementing And Evaluating Search Engines , 2016 .

[19]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[20]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[21]  Chong-Wah Ngo,et al.  Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search , 2015, SIGIR.

[22]  Peter Regner,et al.  On Context- and Sequence-Aware Document Enrichment and Retrieval towards Personalized Recommendations , 2014, FDSE.