论文信息 - A heuristic approach for λ-representative information retrieval from large-scale data - 字舞流文

A heuristic approach for λ-representative information retrieval from large-scale data

Abstract Retrieving representative information from large-scale data becomes an important research issue nowadays, especially in the context of mobile business/search where the screen size and navigability are limited. This paper focuses on certain aspects of representativeness in database queries and web search, and proposes an approach to extracting a subset of results from original search results in light of high coverage and low redundancy. In the paper, the notion of λ -represent is introduced, which enables us to describe the λ -represent relationship between the sets of data objects. Then, the λ -representative problem is formulated as an extension of the typical set covering problem, which leads to developing a heuristic approach (namely, LamRep) to coping with the problem effectively and efficiently. Notably, LamRep is incorporated with a “vote” mechanism, enhanced with an algorithmic acceleration strategy. Data experiments on benchmark data and a real-world example show that LamRep outperforms the other approaches.

Jin Zhang | Guoqing Chen | Qiang Wei

[1] Jin Zhang,et al. An efficient incremental method for generating equivalence groups of search results in information retrieval and queries , 2012, Knowl. Based Syst..

[2] Xiang Lian,et al. Probabilistic top-k dominating queries in uncertain databases , 2013, Inf. Sci..

[3] Jihoon Yang,et al. Extracting sentence segments for text summarization: a machine learning approach , 2000, SIGIR '00.

[4] Lucas Antiqueira,et al. A complex network approach to text summarization , 2009, Inf. Sci..

[5] Ling Shao,et al. Content-based retrieval of human actions from realistic video databases , 2013, Inf. Sci..

[6] Yi-Fen Chen,et al. Herd behavior in purchasing books online , 2008, Comput. Hum. Behav..

[7] Ludovic Lietard,et al. A functional interpretation of linguistic summaries of data , 2012, Inf. Sci..

[8] David R. Karger,et al. Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[9] Yuji Matsumoto,et al. A new approach to unsupervised text summarization , 2001, SIGIR '01.

[10] Sihem Amer-Yahia,et al. Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11] Luis Gravano,et al. Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[12] Guoqing Chen,et al. A combined measure for representative information retrieval in enterprise information systems , 2011, J. Enterp. Inf. Manag..

[13] Vipin Kumar,et al. Partitioning-based clustering for Web document categorization , 1999, Decis. Support Syst..

[14] Nicholas J. Belkin,et al. Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[15] James Allan,et al. Using part-of-speech patterns to reduce query ambiguity , 2002, SIGIR '02.

[16] D. Hochbaum,et al. Analysis of the greedy approach in problems of maximum k‐coverage , 1998 .

[17] Sreenivas Gollapudi,et al. An axiomatic approach for result diversification , 2009, WWW '09.

[18] Ihab F. Ilyas,et al. A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[19] Mohammad Reza Meybodi,et al. Efficient stochastic algorithms for document clustering , 2013, Inf. Sci..

[20] Gerard Salton,et al. The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[21] Jin Zhang,et al. Extracting Representative Information to Enhance Flexible Data Queries , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[22] John D. Lafferty,et al. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[23] Anthony K. H. Tung,et al. Finding representative set from massive data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[25] Philip Calvert,et al. The Information Society: A Study in Continuity and Change , 2004 .

[26] Yue Pan,et al. Born Unequal: A Study of the Helpfulness of User-Generated Product Reviews , 2011 .

[27] John D. Lafferty,et al. A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[28] Gurpreet Singh Lehal,et al. A Survey of Text Summarization Extractive Techniques , 2010 .

[29] Wai Lam,et al. MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[30] Uwe Aickelin,et al. Privileged information for data clustering , 2012, Inf. Sci..

[31] Ali F. Farhoomand,et al. Managerial information overload , 2002, CACM.

[32] Kenneth Steiglitz,et al. Combinatorial Optimization: Algorithms and Complexity , 1981 .

[33] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[34] Etienne E. Kerre,et al. A General Treatment of Data Redundancy in a Fuzzy Relational Data Model , 1992, J. Am. Soc. Inf. Sci..

[35] Lotfi A. Zadeh,et al. Similarity relations and fuzzy orderings , 1971, Inf. Sci..

[36] Danushka Bollegala,et al. A preference learning approach to sentence ordering for multi-document summarization , 2012, Inf. Sci..

[37] Sreenivas Gollapudi,et al. Diversifying search results , 2009, WSDM '09.

[38] Gerhard Weikum,et al. Probabilistic information retrieval approach for ranking of database query results , 2006, TODS.

[39] Bing Liu,et al. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[40] Umberto Straccia,et al. Top-k retrieval for ontology mediated access to relational databases , 2012, Inf. Sci..

[41] Jörn‐Axel Meyer,et al. Information overload in marketing management , 1998 .

[42] Weifa Liang,et al. Top-k query evaluation in sensor networks under query response time constraint , 2011, Inf. Sci..

[43] Anne Morris,et al. The problem of information overload in business organisations: a review of the literature , 2000, Int. J. Inf. Manag..

[44] M. de Rijke,et al. Result diversification based on query-specific cluster ranking , 2011, J. Assoc. Inf. Sci. Technol..

[45] Amanda Spink,et al. Web Search: Public Searching of the Web , 2011, Information Science and Knowledge Management.

[46] Man Lung Yiu,et al. Efficient top-k aggregation of ranked inputs , 2007, TODS.

[47] Yi-Bing Lin,et al. A chapter preloading mechanism for e-reader in mobile environment , 2013, Inf. Sci..

[48] Wenjie Li,et al. A spectral analysis approach to document summarization: Clustering and ranking sentences simultaneously , 2011, Inf. Sci..

[49] I-En Liao,et al. CIS-X: A compacted indexing scheme for efficient query evaluation of XML documents , 2013, Inf. Sci..

[50] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .

[51] Jane You,et al. Visual query processing for efficient image retrieval using a SOM-based filter-refinement scheme , 2012, Inf. Sci..

[52] Ben Carterette,et al. Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[53] Filip Radlinski,et al. Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[54] Ying Li,et al. KDD CUP-2005 report: facing a great challenge , 2005, SKDD.

[55] Wei-Pang Yang,et al. Text summarization using a trainable summarizer and latent semantic analysis , 2005, Inf. Process. Manag..

[56] João Paulo Carvalho,et al. Finding top-k elements in data streams , 2010, Inf. Sci..

[57] Colin Sowman. Dying for information , 2015 .