A case for probabilistic logic for scalable patent retrieval

Patent retrieval has emerged as an important application of information retrieval. Inherent properties of patent searching, such as large corpora, document length and the use of terminology have created the need for alternative approaches to searching. Logic-based information retrieval, as it is modelled by DB+IR systems, can accommodate these needs through its power of abstraction and the use of database-friendly query languages. However, there is a trade-off between expressiveness and efficiency. We propose to tackle such efficiency issues through distribution and parallelisation. In this paper we present our arguments in favour of a parallelised patent searching solution built on top of a probabilistic DB+IR system. Our contributions are both conceptual as well as technical. We demonstrate the flexibility of this approach by modelling two resource selection algorithms in probabilistic logic, expressed in probabilistic Datalog -- a rule-based language designed for expressing database-related tasks. Then, we provide early experimental indications which support the feasibility and technical soundness of this approach.