An Optimal Transport Kernel for Feature Aggregation and its Relationship to Attention