A Language Classifier that Automatically Divides Medical Documents for Experts and Health Care Consumers

We propose a pipelined system for the automatic classification of medical documents according to their language (English, Spanish and German) and their target user group (medical experts vs. health care consumers). We use a simple n-gram based categorization model and present experimental results for both classification tasks. We also demonstrate how this methodology can be integrated into a health care document retrieval system.