A Text Similarity Approach for Precedence Retrieval from Legal Documents

Precedence retrieval of legal documents is an information retrieval task to retrieve prior case documents that are related to a given case document. This helps in automatic linking of related documents to ensure that identical situations are treated similarly in every case. Several methodologies, such as information extraction based on natural language processing, rule-based method, and machine learning techniques, are used to retrieve the prior cases with respect to the current case. In this paper, we propose a text similarity approach for precedence retrieval to retrieve older cases that are similar to a given case from a set of legal documents. Lexical features are extracted from all the legal documents and the similarity between each current case document and all the prior case documents are determined using cosine similarity scores. The list of prior case documents are ranked based on the similarity scores for each current case document. We have evaluated our approach using the data set given by IRLeD@FIRE2017 shared task.