Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches