A genetic algorithm for learning significant phrase patterns in radiology reports

Radiologists disagree with each other over the characteristics and features of what constitutes a normal mammogram and the terminology to use in the associated radiology report. Recently, the focus has been on classifying abnormal or suspicious reports, but even this process needs further layers of clustering and gradation, so that individual lesions can be more effectively classified. Using a genetic algorithm, the approach described here successfully learns phrase patterns for two distinct classes of radiology reports (normal and abnormal). These patterns can then be used as a basis for automatically analyzing, categorizing, clustering, or retrieving relevant radiology reports for the user.

[1]  Uma Shanker Tiwary,et al.  Integrating notion of agency and semantics in information retrieval: an intelligent multi-agent model , 2005, 5th International Conference on Intelligent Systems Design and Applications (ISDA'05).

[2]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[3]  Thomas E. Potok,et al.  Analysis of mammography reports using maximum variation sampling , 2008, GECCO '08.

[4]  Günter Rudolph,et al.  Convergence analysis of canonical genetic algorithms , 1994, IEEE Trans. Neural Networks.

[5]  M. Ben Ahmed,et al.  Building an Ontology-Based Framework For Semantic Information Retrieval: Application To Breast Cancer , 2008, 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications.

[6]  Kai Kang,et al.  Domain-Specific Information Retrieval Based on Improved Language Model , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[7]  M. Patton,et al.  Qualitative evaluation and research methods , 1992 .

[8]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[9]  Kalervo Järvelin,et al.  Targeted s-gram matching: a novel n-gram matching technique for cross- and mono-lingual word form variants , 2002, Inf. Res..

[10]  J. W. Reed A multi-agent system for distributed cluster analysis , 2004, ICSE 2004.

[11]  Kevin Duh,et al.  Automatic Learning of Language Model Structure , 2004, COLING.

[12]  Winnie Cheng,et al.  From n-gram to skipgram to concgram , 2006 .

[13]  Christopher J. Fox,et al.  Lexical Analysis and Stoplists , 1992, Information Retrieval: Data Structures & Algorithms.

[14]  Zhen Zhu,et al.  Framework of multi-agent information retrieval system based on ontology and its application , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[15]  Simone Teufel,et al.  A Bootstrapping Approach to Unsupervised Detection of Cue Phrase Variants , 2006, ACL.

[16]  Vijay V. Raghavan,et al.  A critical analysis of vector space model for information retrieval , 1986 .