An Evaluation of Machine Learning Methods and Visualization of Results to Characterize Large Healthcare Document Collections