Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding