论文信息 - Predicting DataSpace Retrieval Using Probabilistic Hidden Information

Predicting DataSpace Retrieval Using Probabilistic Hidden Information

This paper discusses the issues involved in the design of a complete information retrieval system for DataSpace based on user relevance probabilistic schemes. First, Information Hidden Model (IHM) is constructed taking into account the users' perception of similarity between documents. The system accumulates feedback from the users and employs it to construct user oriented clusters. IHM allows integrating uncertainty over multiple, interdependent classifications and collectively determines the most likely global assignment. Second, Three different learning strategies are proposed, namely query-related UHH, UHB and UHS (User Hidden Habit, User Hidden Background, and User Hidden keyword Semantics) to closely represent the user mind. Finally, the probability ranking principle shows that optimum retrieval quality can be achieved under certain assumptions. An optimization algorithm to improve the effectiveness of the probabilistic process is developed. We first predict the data sources where the query results could be found. Therefor, compared with existing approaches, our precision of retrieval is better and do not depend on the size and the DataSpace heterogeneity.

[1] Luis Gravano,et al. Efficient Keyword Search Across Heterogeneous Relational Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2] Meng Xiao-Feng,et al. Research on Dataspace , 2008 .

[3] Beng Chin Ooi,et al. EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[4] Gobinda G. Chowdhury,et al. TREC: Experiment and Evaluation in Information Retrieval , 2007 .

[5] Li Yu. Research on Dataspace , 2008 .

[6] José Luis Vicedo González,et al. TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[7] Jens Dittrich,et al. A Dataspace Odyssey: The iMeMex Personal Dataspace Management System (Demo) , 2007, CIDR.

[8] Kristina Lerman,et al. Information Integration for the Masses , 2008, J. Univers. Comput. Sci..

[9] Ronald Rousseau,et al. Retrieval of very large numbers of items in the Web of Science: an exercise to develop accurate search strategies , 2009, ArXiv.

[10] De Xu,et al. Building Semantic Relationships Incrementally in Dataspace , 2009, 2009 First International Conference on Information Science and Engineering.