CriminelBART: A French Canadian Legal Language Model Specialized in Criminal Law

Jun 23, 2021

11:10

2nd panel - extended abstracts - 5 minutes

00:05 min

Garneau, Nicolas; Gaumond, Eve; Lamontagne, Luc; Déziel, Pierre-Luc

Abstract: Learning language representations is a key component in many natural language processing tasks, and their usefulness is most often challenged by the target domain and vocabulary. It has been shown that language models are surprisingly efficient at learning and transferring such representations to specific domains.

However, the more specialized a target domain is, the harder the transfer, and thus proper fine-tuning is required. This is why we introduce CriminelBART, a French Canadian Legal Language Model specialized in Criminal Law. CriminelBART has been trained exclusively on criminal data. Therefore, the model learned specialized language representation for the criminal domain and not any other area of law.

We illustrate its usefulness within two tasks; the first one, semantic textual similarity, is discriminative in the sense that we analyze the impact of having good language representation for textual classification involving semantic reasoning. The other one analyzes the generative capabilities of CriminelBART with a suite of Cloze Tests. Those are the first stepping stones in this very unique and particular arena that is French-Canadian criminal law.

Copyright 2021 ICAIL. All rights reserved