In this paper, we describe the participation of the Information and Language Processing System (ILPS) group at CLEF eHealth 2019 Task 2.2: Technologically Assisted Reviews in Empirical Medicine. This task is targeted to produce an efficient ordering of the documents and to identify a subset of the documents which contains as many of the relevant abstracts for the least effort. Participants are provided with systematic review topics with each including a review title, a boolean query constructed by Cochrane experts, and a set of PubMed Document Identifiers (PID's) returned by running the boolean query in MEDLINE. We handle the problem under the Continuous Active Learning framework by jointly training a ranking model to rank documents, and conducting a “greedy” sampling to estimate the real number of relevant documents in the collection. We finally submitted four runs.
[1]
M. Ruiz Espejo.
Sampling
,
2013,
Encyclopedic Dictionary of Archaeology.
[2]
D. Horvitz,et al.
A Generalization of Sampling Without Replacement from a Finite Universe
,
1952
.
[3]
Evangelos Kanoulas,et al.
Active Sampling for Large-scale Information Retrieval Evaluation
,
2017,
CIKM.
[4]
Leif Azzopardi,et al.
CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview
,
2018,
CLEF.
[5]
Maura R. Grossman,et al.
Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review
,
2015,
ArXiv.
[6]
S. Ananiadou,et al.
Using text mining for study identification in systematic reviews: a systematic review of current approaches
,
2015,
Systematic Reviews.