Combining text classification and Hidden Markov Modeling techniques for categorizing sentences in randomized clinical trial abstracts.

Randomized clinical trials (RCT) papers provide reliable information about efficacy of medical interventions. Current keyword based search methods to retrieve medical evidence,overload users with irrelevant information as these methods often do not take in to consideration semantics encoded within abstracts and the search query. Personalized semantic search, intelligent clinical question answering and medical evidence summarization aim to solve this information overload problem. Most of these approaches will significantly benefit if the information available in the abstracts is structured into meaningful categories (e.g., background, objective, method, result and conclusion). While many journals use structured abstract format, majority of RCT abstracts still remain unstructured.We have developed a novel automated approach to structure RCT abstracts by combining text classification and Hidden Markov Modeling(HMM) techniques. Results (precision: 0.98, recall: 0.99) of our approach significantly outperform previously reported work on automated categorization of sentences in RCT abstracts.

[1]  Vasileios Hatzivassiloglou,et al.  Leveraging a common representation for personalized search and summarization in a medical digital library , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[2]  P. Lavori,et al.  Electronic Trial Banks: A Complementary Method for Reporting Randomized Trials , 2000, Medical decision making : an international journal of the Society for Medical Decision Making.

[3]  Padmini Srinivasan,et al.  Categorization of Sentence Types in Medical Abstracts , 2003, AMIA.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.