Classification of Clinical Conditions: A Case Study on Prediction of Obesity and Its Co-morbidities

We investigate a multiclass, multilabel classification problem in medical domain in the context of prediction of obesity and its co-morbidities. Challenges of the problem not only lie in the issues of statistical learning such as high dimensionality, interdependence between multiple classes but also in the characteristics of the data itself. In particular, narrative medical reports are predominantly written in free text natural language which confronts the problem of predominant synonymy, hyponymy, negation and temporality. Our work explores the comparative evaluation of both traditional statistical learning based approach and information extraction based approach for the development of predictive computational models. In addition, we propose a scalable framework which combines both the statistical and extraction based methods with appropriate feature representation/selection strategy. The framework leads to reliable results in making correct classification. The framework was designed to participate in the second i2b2 Obesity Challenge.

[1]  Lior Rokach,et al.  Context-Sensitive Medical Information Retrieval , 2004, MedInfo.

[2]  Nigel Collier,et al.  Synonym set extraction from the biomedical literature by lexical pattern discovery , 2007, BMC Bioinformatics.

[3]  Carol Friedman,et al.  Automatic extraction of gene and protein synonyms from MEDLINE and journal articles , 2002, AMIA.

[4]  W. Bruce Croft,et al.  Research Paper: Ad Hoc Classification of Radiology Reports , 1999, J. Am. Medical Informatics Assoc..

[5]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[6]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[7]  Long H. Ngo,et al.  Implementation and Evaluation of Four Different Methods of Negation Detection , 2007 .

[8]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[9]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[10]  Christian Jacquemin,et al.  Automatic Acquisition and Expansion of Hypernym Links , 2004, Comput. Humanit..

[11]  Isaac S. Kohane,et al.  Architecture of the Open-source Clinical Research Chart from Informatics for Integrating Biology and the Bedside , 2007, AMIA.

[12]  Ellen Riloff,et al.  Information extraction as a basis for high-precision text classification , 1994, TOIS.

[13]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[14]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[15]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.