Which is more important for classifying microbial communities: who's there or what they can do?

Classification is a machine-learning approach to develop predictive models that can classify samples into categories correctly. In microbial studies, these categories include disease states and habitats. An ongoing question in microbial ecology is the correct level of analysis to use in order to best discriminate biologically relevant samples. Many studies use the 16S rRNA gene as a taxonomic marker, and then ask how effectively the taxonomic profiles obtained from this marker classify or cluster different microbial communities according to their sample types. Interestingly, the answer may depend on the question being asked. For phylogenetic analysis, different levels of resolution in grouping are differentially successful at different classification tasks. These classification tasks include separating different samples by the person they came from (which depends on fine distinctions among very closely related strains or species), and separating lean from obese individuals (where very broad groups of taxa are more effective) (Knights et al., 2011b).