Large-scale data analysis to identify novel disease phenotypes and genes