Abstract
This paper explores the role of corpus linguistics in legal research, emphasizing the necessity of a representative and balanced corpus of legal language. While existing corpora provide valuable insights, they often focus on limited genres, risking an incomplete understanding of the complexity of legal discourse. The paper advocates for an empirical and comprehensive study of legal language and introduces KoPr, a pioneering representative corpus of Czech legal language, as an illustrative example. This corpus encompasses diverse legal texts, including legislation, case law, academic literature, and communication by legal practitioners. The methodology for constructing KoPr is detailed, from defining legal genres to adapting corpus design principles specifically for legal discourse. Furthermore, the paper addresses practical and methodological limitations encountered during corpus development and proposes solutions for mitigating them. By systematically organizing legal texts within the corpus, the paper ensures representativeness and usability. Presenting KoPr’s parameters and applications, the study highlights the transformative potential of corpus-based research in legal linguistics, offering new insights into the linguistic foundations of the law.