Multidisciplinary Information Retrieval

Pseudo-relevance feedback (PRF) is an effective approach in Information Retrieval but unfortunately many experiments have shown that PRF is ineffective in patent retrieval. This is because the quality of initial results in the patent retrieval is poor and therefore estimating a relevance model via PRF often hurts the retrieval performance due to off-topic terms. We propose a learning to rank framework for estimating the effectiveness of a patent document in terms of its performance in PRF. Specifically, the knowledge of effective feedback documents on past queries is used to estimate effective feedback documents for new queries. This is achieved by introducing features correlated with feedback document effectiveness. We use patent-specific contents to define such features. We then apply regression to predict document effectiveness given the proposed features. We evaluated the effectiveness of the proposed method on the patent prior art search collection CLEF-IP 2010. Our experimental results show significantly improved retrieval accuracy over a PRF baseline which expands the query using all top-ranked documents.

[1]  Emanuele Pianta,et al.  The TextPro Tool Suite , 2008, LREC.

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  Emanuele Pianta,et al.  KX: A Flexible System for Keyphrase eXtraction , 2010, *SEMEVAL.

[4]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[5]  Giovanni Maria Sacco,et al.  Dynamic Taxonomies and Faceted Search: Theory, Practice, and Experience , 2009, The Information Retrieval Series.

[6]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[8]  Roelof van Zwol,et al.  Machine learned ranking of entity facets , 2010, SIGIR '10.

[9]  Hidekazu Yanagimoto,et al.  Information filtering using a probabilistic model , 2006, Artificial Life and Robotics.

[10]  David Hawking,et al.  Focused Crawling in Depression Portal Search: A Feasibility Study , 2004, ADCS.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Hong-Gee Kim,et al.  An ontology-based approach to learnable focused crawling , 2008, Inf. Sci..

[13]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[14]  Yue Xu,et al.  Automatic Pattern-Taxonomy Extraction for Web Mining , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[15]  Robert C. Moore Fast and accurate sentence alignment of bilingual corpora , 2002, AMTA.

[16]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[17]  Yasuhiko Kitamura,et al.  Keyword Spices: A New Method for Building Domain-Specific Web Search Engines , 2001, IJCAI.

[18]  Toru Ishida,et al.  Domain-specific Web search with keyword spices , 2004, IEEE Transactions on Knowledge and Data Engineering.

[19]  Cheong Hee Park,et al.  On applying linear discriminant analysis for multi-labeled problems , 2008, Pattern Recognit. Lett..

[20]  Kari Torkkola,et al.  Linear Discriminant Analysis in Document Classification , 2007 .

[21]  Qiang Yang,et al.  Deep classification in large-scale text hierarchies , 2008, SIGIR '08.

[22]  Hsinchun Chen,et al.  MetaSpider: Meta-searching and categorization on the Web , 2001, J. Assoc. Inf. Sci. Technol..

[23]  Oren Etzioni,et al.  Dynamic Reference Sifting: A Case Study in the Homepage Domain , 1997, Comput. Networks.

[24]  Roelof van Zwol,et al.  Faceted exploration of image search results , 2010, WWW '10.

[25]  Andrew McCallum,et al.  Building Domain-Specific Search Engines with Machine Learning Techniques , 1999 .

[26]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[27]  Qiang Wang,et al.  Ontology-Based Focused Crawling , 2009, 2009 International Conference on Information, Process, and Knowledge Management.

[28]  Yiannis Kompatsiaris,et al.  AQUAM: automatic query formulation architecture for mobile applications , 2008, MUM '08.

[29]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[30]  Andrew McCallum,et al.  A Machine Learning Approach to Building Domain-Specific Search Engines , 1999, IJCAI.