Use of a Latent Topic Model for Characteristic Extraction from Health Checkup Questionnaire Data

OBJECTIVES When patients complete questionnaires during health checkups, many of their responses are subjective, making topic extraction difficult. Therefore, the purpose of this study was to develop a model capable of extracting appropriate topics from subjective data in questionnaires conducted during health checkups. METHODS We employed a latent topic model to group the lifestyle habits of the study participants and represented their responses to items on health checkup questionnaires as a probability model. For the probability model, we used latent Dirichlet allocation to extract 30 topics from the questionnaires. According to the model parameters, a total of 4381 study participants were then divided into groups based on these topics. Results from laboratory tests, including blood glucose level, triglycerides, and estimated glomerular filtration rate, were compared between each group, and these results were then compared with those obtained by hierarchical clustering. RESULTS If a significant (p < 0.05) difference was observed in any of the laboratory measurements between groups, it was considered to indicate a questionnaire response pattern corresponding to the value of the test result. A comparison between the latent topic model and hierarchical clustering grouping revealed that, in the latent topic model method, a small group of participants who reported having subjective signs of urinary disorder were allocated to a single group. CONCLUSIONS The latent topic model is useful for extracting characteristics from a small number of groups from questionnaires with a large number of items. These results show that, in addition to chief complaints and history of past illness, questionnaire data obtained during medical checkups can serve as useful judgment criteria for assessing the conditions of patients.

[1]  M. Dambrosio,et al.  Counseling, quality of life, and acute postoperative pain in elderly patients with hip fracture , 2013, Journal of multidisciplinary healthcare.

[2]  D. Burkhoff,et al.  Development and validation of a patient questionnaire to determine New York Heart Association classification. , 2004, Journal of cardiac failure.

[3]  P. Cuijpers,et al.  The distribution of self-reported psychotic-like experiences in non-psychotic help-seeking mental health patients in the general population; a factor mixture analysis , 2014, Social Psychiatry and Psychiatric Epidemiology.

[4]  Debbie A. Travers,et al.  Evaluation of preprocessing techniques for chief complaint classification , 2008, J. Biomed. Informatics.

[5]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Peter J. Haug,et al.  Classifying free-text triage chief complaints into syndromic categories with natural language processing , 2005, Artif. Intell. Medicine.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  K. Kamibeppu,et al.  The Japanese version of the Postpartum Bonding Questionnaire: Examination of the reliability, validity, and scale structure. , 2015, Journal of psychosomatic research.

[9]  Ali Montazeri,et al.  Factor Structure of the World Health Organization's Quality of Life Questionnaire-BREF in Patients with Coronary Artery Disease , 2013, International journal of preventive medicine.

[10]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  L. Walker,et al.  Assessment of relational intimacy: factor analysis of the personal assessment of intimacy in relationships questionnaire , 2014, Psycho-oncology.

[12]  A. Ido,et al.  Impact of cigarette smoking on onset of nonalcoholic fatty liver disease over a 10-year period , 2011, Journal of Gastroenterology.

[13]  G. Guilera,et al.  Polyvictimization and its relationship to symptoms of psychopathology in a southern European sample of adolescent outpatients. , 2014, Child abuse & neglect.

[14]  Mathukumalli Vidyasagar,et al.  Machine learning methods in the computational biology of cancer , 2014, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[15]  S. George,et al.  Low Back Pain Subgroups Using Fear-Avoidance Model Measures: Results of a Cluster Analysis , 2012, The Clinical journal of pain.

[16]  K. Hau,et al.  Quality of life of Chinese urban community residents: a psychometric study of the mainland Chinese version of the WHOQOL-BREF , 2012, BMC Medical Research Methodology.

[17]  Sakae Tanaka,et al.  Comparison of the Japanese Orthopaedic Association (JOA) Score and Modified JOA (mJOA) Score for the Assessment of Cervical Myelopathy: A Multicenter Observational Study , 2015, PloS one.

[18]  B. Rodgers,et al.  Quality of life impairment associated with body dissatisfaction in a general population sample of women , 2013, BMC Public Health.

[19]  E. Toskala,et al.  The link between parental allergy and offspring allergic and nonallergic rhinitis , 2013, Allergy.

[20]  J. Leask,et al.  Quality of life for parents of children with influenza-like illness: development and validation of Care-ILI-QoL , 2014, Quality of Life Research.

[21]  M. Fukui,et al.  Protective effect of alcohol consumption for fatty liver but not metabolic syndrome. , 2012, World journal of gastroenterology.

[22]  W. Hiller,et al.  Classification characteristics of the Patient Health Questionnaire-15 for screening somatoform disorders in a primary care setting. , 2011, Journal of psychosomatic research.

[23]  Jie Gui,et al.  Multi-step dimensionality reduction and semi-supervised graph-based tumor classification using gene expression data , 2010, Artif. Intell. Medicine.

[24]  M. Berlim,et al.  Reliability and validity of the WHOQOL BREF in a sample of Brazilian outpatients with major depression , 2005, Quality of Life Research.