Salaün, Olivier; Langlais, Philippe; Benyekhlef, Karim
Abstract: Legal judgment prediction (LJP) can be formalized as text classification tasks in which models are given the factual description of a dispute and must return some labels that can be either the verdict decided by the judge or some other information such as relevant law articles or charge prediction. The literature shows that the use of articles as input features helps in improving the classification performance.
In our work, we designed a verdict prediction task as text classification based on landlord-tenant tribunal decisions and we applied a BERT-based model to which we fed different article-based representation.
Although the addition of such features helps in gaining up to an extra 3.5% in exact match, it delivers mitigated results in terms of macro-averaged F1 score as such approach only improves the prediction of the most frequent labels but fails at predicting the least frequent ones. We also notice that some conditions must apply for the articles-based features to improve the F1 score of some verdict labels. All in all, these experiments suggest that pre-trained and fine-tuned transformer-based models are not scalable as is for legal reasoning in real life scenarios at they would only excel in accurately predicting the most recurrent verdicts to the detriment of other legal outcomes.