PRESIDE: A Judge Entity Recognition and Disambiguation Model for US District Court Records

The docket sheet of a court case contains a wealth of information about the progression of a case, the parties’ and judge’s decision-making along the way, and the case’s ultimate outcome that can be used in analytical applications. However, the unstructured text of the docket sheet and the terse and variable phrasing of docket entries require the development of new models to identify key entities to enable analysis at a systematic level. We developed a judge entity recognition language model and disambiguation pipeline for US District Court records. Our model can robustly identify mentions of judicial entities in free text (~99% F-1 Score) and outperforms general state-of-the-art language models by 13%. Our disambiguation pipeline is able to robustly identify both appointed and non-appointed judicial actors and correctly infer the type of appointment (~99% precision). Lastly, we show with a case study on in forma pauperis decision-making that there is substantial error (~30%) attributing decision outcomes to judicial actors if the free text of the docket is not used to make the identification and attribution.