Unsupervised approaches for measuring textual similarity between legal court case reports

Artificial Intelligence and Law 29 (3):417-451 (2021)
  Copy   BIBTEX

Abstract

In the domain of legal information retrieval, an important challenge is to compute similarity between two legal documents. Precedents play an important role in The Common Law system, where lawyers need to frequently refer to relevant prior cases. Measuring document similarity is one of the most crucial aspects of any document retrieval system which decides the speed, scalability and accuracy of the system. Text-based and network-based methods for computing similarity among case reports have already been proposed in prior works but not without a few pitfalls. Since legal citation networks are generally highly disconnected, network based metrics are not suited for them. Till date, only a few text-based and predominant embedding based methods have been employed, for instance, TF-IDF based approaches, Word2Vec and Doc2Vec based approaches. We investigate the performance of 56 different methodologies for computing textual similarity across court case statements when applied on a dataset of Indian Supreme Court Cases. Among the 56 different methods, thirty are adaptations of existing methods and twenty-six are our proposed methods. The methods studied include models such as BERT and Law2Vec. It is observed that the more traditional methods that rely on a bag-of-words representation performs better than the more advanced context-aware methods for computing document-level similarity. Finally we nominate, via empirical validation, five of our best performing methods as appropriate for measuring similarity between case reports. Among these five, two are adaptations of existing methods and the other three are our proposed methods.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 101,173

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

言葉の意味の類似性判別に関するシソーラスと概念ベースの性能評価.石川 勉 川島 貴広 - 2005 - Transactions of the Japanese Society for Artificial Intelligence 20:326-336.
Case-based reasoning and its implications for legal expert systems.Kevin D. Ashley - 1992 - Artificial Intelligence and Law 1 (2):113-208.
単語の属性空間の表現方法.稲子 希望 笠原 要 - 2002 - Transactions of the Japanese Society for Artificial Intelligence 17:539-547.

Analytics

Added to PP
2021-01-05

Downloads
76 (#275,366)

6 months
5 (#1,037,427)

Historical graph of downloads
How can I increase my downloads?