Mining linguistic cues for query expansion: applications to drug interaction search

Given a drug under development, what are other drugs or biochemical compounds that it might interact with? Early answers to this question, by mining the literature, are valuable for pharmaceutical companies, both monetarily and in avoiding public relations nightmares. Inferring drug-drug interactions is also important in designing combination therapies for complex diseases including cancers. We study this problem as one of mining linguistic cues for query expansion. By using (only) positive instances of drug interactions, we show how we can extract linguistic cues which can then be used to expand and reformulate queries to improve the effectiveness of drug interaction search. Our approach integrates many learning paradigms: partially supervised classification, association measures for collocation mining, and feature selection in supervised learning. We demonstrate compelling results on using positive examples from the DrugBank database to seed MEDLINE searches for drug interactions. In particular, we show that purely data-driven linguistic cues can be effectively mined and applied to realize a successful domain-specific query expansion framework.

[1]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[2]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[3]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[4]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[6]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[7]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[8]  Toru Ishida,et al.  Domain-specific Web search with keyword spices , 2004, IEEE Transactions on Knowledge and Data Engineering.

[9]  Udo Hahn,et al.  Collocation Extraction Based on Modifiability Statistics , 2004, COLING.

[10]  William R. Hersh,et al.  Phrases, Boosting, and Query Expansion Using External Knowledge Resources for Genomic Information Retrieval , 2003, TREC.

[11]  Berthier A. Ribeiro-Neto,et al.  Concept-based interactive query expansion , 2005, CIKM '05.

[12]  SmadjaFrank Retrieving collocations from text , 1993 .

[13]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[14]  Eric Wehrli,et al.  Multi-word collocation extraction by syntactic composition of collocation bigrams , 2004, RANLP.

[15]  C. Lee Giles,et al.  Extracting query modifications from nonlinear SVMs , 2002, WWW '02.

[16]  Key-Sun Choi,et al.  A Comparison of Collocation-Based Similarity Measures in Query Expansion , 1999, Inf. Process. Manag..

[17]  Jian-Yun Nie,et al.  Query expansion using term relationships in language models for information retrieval , 2005, CIKM '05.

[18]  Mark Levene,et al.  Search Engines: Information Retrieval in Practice , 2011, Comput. J..

[19]  Stefan Evert,et al.  Corpora and collocations , 2007 .

[20]  Pavel Pecina,et al.  Combining Association Measures for Collocation Extraction , 2006, ACL.

[21]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[22]  P. Sorger,et al.  Systems biology and combination therapy in the quest for clinical efficacy , 2006, Nature chemical biology.

[23]  William D. Figg,et al.  Drug interactions in cancer therapy , 2006, Nature Reviews Cancer.

[24]  Seiji Yamada,et al.  Semisupervised Query Expansion with Minimal Feedback , 2007, IEEE Transactions on Knowledge and Data Engineering.

[25]  Chew Lim Tan,et al.  Proposing a New Term Weighting Scheme for Text Categorization , 2006, AAAI.

[26]  J. Kovarik,et al.  Everolimus drug interactions: application of a classification system for clinical decision making , 2006, Biopharmaceutics & drug disposition.

[27]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[28]  ChengXiang Zhai,et al.  Mining term association patterns from search logs for effective query reformulation , 2008, CIKM '08.

[29]  Chih-Jen Lin,et al.  Feature Ranking Using Linear SVM , 2008, WCCI Causation and Prediction Challenge.

[30]  Peter Willett,et al.  The limitations of term co-occurrence data for query expansion in document retrieval systems , 1991, J. Am. Soc. Inf. Sci..

[31]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[32]  Bridget T. McInnes,et al.  Extending the Log Likelihood Measure to Improve Collocation Identification , 2004 .

[33]  Stephen E. Robertson,et al.  Query Expansion with Long-Span Collocates , 2003, Information Retrieval.

[34]  Zhenyu Liu,et al.  Knowledge-based query expansion to support scenario-specific retrieval of medical free text , 2005, SAC '05.

[35]  S. Piscitelli,et al.  Interactions among drugs for HIV and opportunistic infections. , 2001, The New England journal of medicine.

[36]  Takenobu Tokunaga,et al.  Combining multiple evidence from different types of thesaurus for query expansion , 1999, SIGIR '99.

[37]  Darren Pearce A Comparative Evaluation of Collocation Extraction Techniques , 2002, LREC.

[38]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[39]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[40]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[41]  M. F. Pyfer,et al.  Association Between Tamsulosin and Serious Ophthalmic Adverse Events in Older Men Following Cataract Surgery , 2010 .

[42]  Hugh E. Williams,et al.  Query expansion using associated queries , 2003, CIKM '03.

[43]  Stefan Evert,et al.  Methods for the Qualitative Evaluation of Lexical Association Measures , 2001, ACL.

[44]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[45]  M. Kanehisa,et al.  Network analysis of adverse drug interactions. , 2008, Genome informatics. International Conference on Genome Informatics.

[46]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.