Text Mining of Clinical Records for Cancer Diagnosis

The ability to automatically identify relationships between cancer diseases and external factors from medical records for supporting cancer diagnosis would be a valuable contribution in public health fields. Unfortunately, so far little attention has been paid on such a problem domain to developing effective solutions. In this work, we propose a prototype for automating the extraction of relationships between cancer diseases and potential factors from clinical records. We describe a scheme of the system prototype that integrates cancer ontology, and developed text mining techniques, covering self-organizing maps (SOM) algorithm as well as support vector machines (SVM) methods to carry out the system development. The results show that the integration of medical ontology and the text mining platforms is capable of extracting the potential patterns and re-categorize clinical records.

[1]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[2]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[3]  Ting-Chung Chen,et al.  A Comparative Study on Supervised and Unsupervised Learning Approaches for Multilingual Text Categorization , 2006, First International Conference on Innovative Computing, Information and Control - Volume I (ICICIC'06).

[4]  Ioannis Xenarios,et al.  Mining literature for protein-protein interactions , 2001, Bioinform..

[5]  Wendy W. Chapman,et al.  Fever detection from free-text clinical records for biosurveillance , 2004, Journal of Biomedical Informatics.

[6]  Hsinchun Chen,et al.  Meeting medical terminology needs-the ontology-enhanced Medical Concept Mapper , 2001, IEEE Transactions on Information Technology in Biomedicine.

[7]  Yuval Shahar,et al.  DEGEL: A Hybrid, Multiple-Ontology Framework for Specification and Retrieval of Clinical Guidelines , 2003, AIME.

[8]  Daniel T. Heinze,et al.  Mining free-text medical records , 2001, AMIA.

[9]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[10]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[11]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[12]  Hsin-Chang Yang,et al.  Mining Unstructured Web Pages to Enhance Web Information Retrieval , 2006, First International Conference on Innovative Computing, Information and Control - Volume I (ICICIC'06).

[13]  Sougata Mukherjea,et al.  Information retrieval and knowledge discovery utilizing a biomedical patent semantic Web , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  William B. Langdon,et al.  BioRAT: extracting biological information from full-length papers , 2004, Bioinform..

[15]  Mario Gómez,et al.  MELISA. An ontology-based agent for information retrieval in medicine , 2000 .

[16]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[17]  Hyoil Han,et al.  Approaches to text mining for clinical medical records , 2006, SAC '06.

[18]  Clement J. McDonald,et al.  Automated Extraction and Normalization of Findings from Cancer-Related Free-Text Radiology Reports , 2003, AMIA.

[19]  Hsin-Chang Yang,et al.  Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization , 2005, Journal of Intelligent Information Systems.

[20]  Ying Liu,et al.  Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships: A Comparative Study of Algorithms , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Anton Yuryev,et al.  Extracting human protein interactions from MEDLINE using a full-sentence parser , 2004, Bioinform..

[22]  Jung-Hsien Chiang,et al.  GIS: a biomedical text-mining system for gene information discovery , 2004, Bioinform..

[23]  Hsin-Chang Yang,et al.  A classifier-based text mining approach for evaluating semantic relatedness using support vector machines , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[24]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[25]  James A. Hendler,et al.  The National Cancer Institute's Thésaurus and Ontology , 2003, J. Web Semant..