Finding families for genomic ORFans

‘Why, if species have descended from other species by insensibly fine gradations, do we not everywhere see innumerable transitional forms?’ Charles Darwin, in The Origin of Species, Chapter 6: Difficulties in Theory The complete sequences of over a dozen microbial genomes are now known. At first glance roughly one-third of the protein encoding regions (ORFs) in each genome have no detectable sequence similarity to proteins of other genomes. Why, if proteins in different organisms have descended from common ancestral proteins by duplication and adaptive variation, do so many today show no similarity to each other? In this commentary we refer to these orphan ORFs as ‘ORFans’ and ask why there are so many, and how they can be assigned to known protein families. As a first step in solving the puzzle of why there are so many genomic ORFans, we have investigated some trivial explanations.