Garneau, Nicolas; Gaumond, Eve; Lamontagne, Luc; Déziel, Pierre-Luc
Abstract: In this paper, we introduce a new French Data-to-Text dataset in the legal domain: Plum2Text.
It is made out of plumitif (docket files)–descriptions that are derived from publicly available documents issued by some Canadian criminal courts.
Plum2Text will likely be useful to train statistical natural language generation algorithms to make the plumitifs easily understandable for Canadian citizens. The inputs and outputs of the dataset are unique: on the data side, the values of the table contain long pieces of textual utterance, and on the text side (or reference), it most often consists of a paraphrase of the table values. We describe how we curated the plumitif–description associations by introducing an annotation tool and a methodology specific to the Data-to-Text natural language generation task.
We do so by using simple yet efficient text classifiers that help the annotator leverage annotated examples in the annotation process. We also analyze the benefits of decontextualizing the descriptions and conduct experiments using a baseline generation model on Plum2Text and assess its performance with standard metrics.