Garneau, Nicolas; Gaumond, Eve; Lamontagne, Luc; Déziel, Pierre-Luc
Abstract: Learning language representations is a key component in many natural language processing tasks, and their usefulness is most often challenged by the target domain and vocabulary. It has been shown that language models are surprisingly efficient at learning and transferring such representations to specific domains.
However, the more specialized a target domain is, the harder the transfer, and thus proper fine-tuning is required. This is why we introduce CriminelBART, a French Canadian Legal Language Model specialized in Criminal Law. CriminelBART has been trained exclusively on criminal data. Therefore, the model learned specialized language representation for the criminal domain and not any other area of law.
We illustrate its usefulness within two tasks; the first one, semantic textual similarity, is discriminative in the sense that we analyze the impact of having good language representation for textual classification involving semantic reasoning. The other one analyzes the generative capabilities of CriminelBART with a suite of Cloze Tests. Those are the first stepping stones in this very unique and particular arena that is French-Canadian criminal law.