Recherche de réponses précises à des questions médicales : le système de questions-réponses MEANS. (Finding precise answers to medical questions : the question-answering system MEANS)

La recherche de reponses precises a des questions formulees en langue naturelle renouvelle le champ de la recherche d’information. De nombreux travaux ont eu lieu sur la recherche de reponses a des questions factuelles en domaine ouvert. Moins de travaux ont porte sur la recherche de reponses en domaine de specialite, en particulier dans le domaine medical ou biomedical. Plusieurs conditions differentes sont rencontrees en domaine de specialite comme les lexiques et terminologies specialises, les types particuliers de questions, entites et relations du domaine ou les caracteristiques des documents cibles. Dans une premiere partie, nous etudions les methodes permettant d’analyser semantiquement les questions posees par l’utilisateur ainsi que les textes utilises pour trouver les reponses. Pour ce faire nous utilisons des methodes hybrides pour deux tâches principales : (i) la reconnaissance des entites medicales et (ii) l’extraction de relations semantiques. Ces methodes combinent des regles et patrons construits manuellement, des connaissances du domaine et des techniques d’apprentissage statistique utilisant differents classifieurs. Ces methodes hybrides, experimentees sur differents corpus, permettent de pallier les inconvenients des deux types de methodes d’extraction d’information, a savoir le manque de couverture potentiel des methodes a base de regles et la dependance aux donnees annotees des methodes statistiques. Dans une seconde partie, nous etudions l’apport des technologies du web semantique pour la portabilite et l’expressivite des systemes de questions-reponses. Dans le cadre de notre approche, nous exploitons les technologies du web semantique pour annoter les informations extraites en premier lieu et pour interroger semantiquement ces annotations en second lieu. Enfin, nous presentons notre systeme de questions-reponses, appele MEANS, qui utilise a la fois des techniques de TAL, des connaissances du domaine et les technologies du web semantique pour repondre automatiquement aux questions medicales.

[1]  Syin Chan,et al.  Extracting Causal Knowledge from a Medical Database Using Graphical Patterns , 2000, ACL.

[2]  Jimmy J. Lin,et al.  Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering , 2006, ACL.

[3]  Pierre Zweigenbaum,et al.  CARAMBA: Concept, Assertion, and Relation Annotation using Machine-learning Based Approaches , 2010 .

[4]  Clement J. McDonald,et al.  Extracting Structured Information from Free Text Pathology Reports , 2003, AMIA.

[5]  Pierre Zweigenbaum,et al.  Une approche hybride pour la détection automatique des relations sémantiques entre entités médicales , 2011 .

[6]  Baptist Gallwitz,et al.  Diabetes mellitus Typ 2 , 2002, Monatsschrift Kinderheilkunde.

[7]  Philipp Cimiano,et al.  Towards portable natural language interfaces to knowledge bases - The case of the ORAKEL system , 2008, Data Knowl. Eng..

[8]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[9]  Anne-Laure Ligozat,et al.  The Bilingual System MUSCLEF at QA@CLEF 2006 , 2006, CLEF.

[10]  Ting Wang,et al.  Automatic Extraction of Hierarchical Relations from Text , 2006, ESWC.

[11]  Olivier Galibert,et al.  Proposal for an Extension of Traditional Named Entities: From Guidelines to Evaluation, an Overview , 2011, Linguistic Annotation Workshop.

[12]  Pierre Zweigenbaum,et al.  Automatic Extraction of semantic relations between medical entities: Application to the treatment relation , 2010, Semantic Mining in Biomedicine.

[13]  M O Hotvedt,et al.  Continuing medical education: actually learning rather than simply listening. , 1996, JAMA.

[14]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[15]  Jimmy J. Lin,et al.  Evaluation of PICO as a Knowledge Representation for Clinical Questions , 2006, AMIA.

[16]  L. Tick,et al.  Medical Language Processing: Applications to Patient Data Representation and Automatic Encoding , 1995, Methods of Information in Medicine.

[17]  Pierre Zweigenbaum,et al.  Analyse et transformation des questions médicales en requêtes SPARQL , 2012, CORIA.

[18]  Delphine Bernhard,et al.  Analyse automatique de la modalité et du niveau de certitude : application au domaine médical (Automatic analysis of modality and level of certainty: application to the medical domain) , 2011, JEPTALNRECITAL.

[19]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[20]  Manuel Palomar,et al.  A knowledge based method for the medical question answering problem , 2007, Comput. Biol. Medicine.

[21]  Paul Buitelaar,et al.  Semantic relations in concept-based cross-language medical information retrieval , 2003 .

[22]  Enrico Motta,et al.  Is Question Answering fit for the Semantic Web?: A survey , 2011, Semantic Web.

[23]  Pierre Zweigenbaum L'UMLS entre langue et ontologie: une approche pragmatique dans le domaine médical , 2004, Rev. d'Intelligence Artif..

[24]  Peter J. Haug,et al.  Comparing Natural Language Processing Tools to Extract Medical Problems from Narrative Text , 2005, AMIA.

[25]  Suzan Verberne,et al.  Developing an Approach for Why-Question Answering , 2006, EACL.

[26]  Mehdi Embarek Un système de question-réponse dans le domaine médical : le système Esculape , 2008 .

[27]  Philip Resnik,et al.  Word-level Alignment for Multilingual Resource Acquisition , 2002 .

[28]  Pierre Zweigenbaum,et al.  Extraction d’information automatique en domaine médical par projection inter-langue : vers un passage à l’échelle (Automatic Information Extraction in the Medical Domain by Cross-Lingual Projection) [in French] , 2012, JEP/TALN/RECITAL.

[29]  Sanda M. Harabagiu,et al.  Performance issues and error analysis in an open-domain question answering system , 2003, TOIS.

[30]  Pierre Zweigenbaum,et al.  Automatic extraction of semantic relations between medical entities: a rule based approach , 2011, J. Biomed. Semant..

[31]  B J Stapley,et al.  Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[32]  C. N. Hales,et al.  Non-insulin-dependent diabetes mellitus. , 1997, British medical bulletin.

[33]  Pierre Zweigenbaum,et al.  Towards a Medical Question-Answering System: a Feasibility Study , 2003, MIE.

[34]  Tiejun Zhao,et al.  Learning Chinese Bracketing Knowledge Based on a Bilingual Language Model , 2002, COLING.

[35]  Diego Molla A Corpus for Evidence Based Medicine Summarisation , 2010, ALTA.

[36]  M. Ebell,et al.  Obstacles to answering doctors' questions about patient care with evidence: qualitative study , 2002, BMJ : British Medical Journal.

[37]  Stéfan Jacques Darmoni,et al.  Evaluation of French and English MeSH Indexing Systems with a Parallel Corpus , 2005, AMIA.

[38]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[39]  R. Hodin,et al.  Adrenal-Sparing Surgery , 2005 .

[40]  H. Takeshita,et al.  Clinical Evidence at the Point of Care in Acute Medicine: A Handheld Usability Case Study , 2002 .

[41]  Fabio Rinaldi,et al.  Answering Questions in the Genomics Domain , 2004, ACL 2004.

[42]  Jian Su,et al.  Protein-Protein Interaction Extraction: A Supervised Learning Approach} , 2005 .

[43]  Olivier Bodenreider,et al.  Chapter 3 Lexical, terminological and ontological resources for biological text mining , 2006 .

[44]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[45]  S. Silver,et al.  Heart Failure , 1937, The New England journal of medicine.

[46]  Thierry Poibeau,et al.  Extraction de noms propres ` a partir de textes vari´ es: probl´ ematique et enjeux , 2001 .

[47]  Angus Roberts,et al.  Extracting Clinical Relationships from Patient Narratives , 2008, BioNLP.

[48]  W. Moore,et al.  Asymptomatic Carotid Stenosis , 2005 .

[49]  Xinglong Wang Rule-Based Protein Term Identification with Help from Automatic Species Tagging , 2007, CICLing.

[50]  J. W. Kerns,et al.  Do antiarrhythmics prevent sudden death in patients with heart failure , 2003 .

[51]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[52]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[53]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[54]  Enrico Motta,et al.  AquaLog: An ontology-driven question answering system for organizational semantic intranets , 2007, J. Web Semant..

[55]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[56]  William A. Woods,et al.  Progress in natural language understanding: an application to lunar geology , 1973, AFIPS National Computer Conference.

[57]  James Jungho Pak,et al.  2 , 2009, NEMS.

[58]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[59]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[60]  Natalia Grabar,et al.  Building a Text Corpus for Representing the Variety of Medical Language , 2001, MedInfo.

[61]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[62]  Fredrik Olsson,et al.  A Web Survey on the Use of Active Learning to Support Annotation of Text Data , 2009, HLT-NAACL 2009.

[63]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[64]  J. Sim,et al.  The kappa statistic in reliability studies: use, interpretation, and sample size requirements. , 2005, Physical therapy.

[65]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[66]  Sivaji Bandyopadhyay,et al.  Named Entity Recognition using Support Vector Machine: A Language Independent Approach , 2010 .

[67]  Graeme Hirst,et al.  Analysis of Semantic Classes in Medical Text for Question Answering , 2004 .

[68]  Wang Wei,et al.  Named Entity Recognition Using Hybrid Machine Learning Approach , 2006, 2006 5th IEEE International Conference on Cognitive Informatics.

[69]  Charles Sneiderman,et al.  Argument identification for arterial branching predications asserted in cardiac catheterization reports , 2000, AMIA.

[70]  Tyne Liang,et al.  Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus , 2005, NLDB.

[71]  Guillaume Jacquet,et al.  Vers une double annotation des Entités Nommées , 2006, Trait. Autom. des Langues.

[72]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[73]  Maud Ehrmann Les Entitées Nommées, de la linguistique au TAL : Statut théorique et méthodes de désambiguïsation. (Named entities, from Linguistics to NLP: Theoretical status and disambiguation methods) , 2008 .

[74]  Christopher S. G. Khoo,et al.  Automatic identification of treatment relations for medical ontology learning : an exploratory study , 2004 .

[75]  T. Poibeau Extraction automatique d'information : Du texte brut au web sémantique , 2003 .

[76]  Pierre Zweigenbaum,et al.  A Hybrid Approach for the Extraction of Semantic Relations from MEDLINE Abstracts , 2011, CICLing.

[77]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[78]  Pierre Zweigenbaum,et al.  Medical Entity Recognition: A Comparaison of Semantic and Statistical Methods , 2011, BioNLP@ACL.

[79]  Pierre Zweigenbaum,et al.  Two Different Machine Learning Techniques for Drug-Drug Interaction Extraction , 2011 .

[80]  Proux,et al.  Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction. , 1998, Genome informatics. Workshop on Genome Informatics.

[81]  Pierre Zweigenbaum,et al.  Medical question answering: translating medical questions into sparql queries , 2012, IHI '12.

[82]  Satoshi Sekine,et al.  Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy , 2004, LREC.

[83]  Guillaume Pitel,et al.  Annotation précise du français en sémantique de rôles par projection cross-linguistique , 2007 .

[84]  G. Greendale,et al.  Yoga for women with hyperkyphosis: results of a pilot study. , 2002, American journal of public health.

[85]  S. Satya‐Murti Evidence-based Medicine: How to Practice and Teach EBM , 1997 .

[86]  Christian Jacquemin,et al.  How NLP can improve Question Answering , 2002 .

[87]  Enrico Motta,et al.  AquaLog: An Ontology-Portable Question Answering System for the Semantic Web , 2005, ESWC.

[88]  Jimmy J. Lin,et al.  Omnibase: Uniform Access to Heterogeneous Data for Question Answering , 2002, NLDB.

[89]  Helen Mayo,et al.  Does yoga speed healing for petients with low back pain , 2004 .

[90]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.

[91]  Xavier Tannier,et al.  FIDJI: using syntax for validating answers in multiple documents , 2010, Information Retrieval.

[92]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[93]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[94]  Isabelle Tellier,et al.  Champs Markoviens Conditionnels pour l'extraction d'information , 2011 .

[95]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[96]  Gerhard Weikum,et al.  Combining linguistic and statistical analysis to extract relations from web documents , 2006, KDD '06.

[97]  Pierre Zweigenbaum,et al.  Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification , 2011, J. Am. Medical Informatics Assoc..

[98]  Boris Katz,et al.  From Sentence Processing to Information Access on the World Wide Web , 1997 .

[99]  Sanda M. Harabagiu,et al.  LASSO: A Tool for Surfing the Answer Net , 1999, TREC.

[100]  Pierre Zweigenbaum,et al.  Annotation et Interrogation Sémantiques de Textes Médicaux , 2010, WSM@IC.