Assessing Topic Models: How to Obtain Robustness?

In this work we investigate the influence of varying daily activity dataset characteristics on topic model performance stability for daily routine discovery. For this purpose, we denote a set of key dataset properties that influence the experimental design regarding recording, as well as data pre-processing steps. Using generated daily activity datasets, we identified optimal topic model stability for particular dataset properties. Results indicated that topic model routine duration should exceed document size by a factor of more than two. Recording durations of more than 9 days were required for a set of four routines and activity primitive overlap may not exceed 5%.