Sovrano, Francesco; Palmirani, Monica; Distefano, Biagio; Sapienza, Salvatore; Vitali, Fabio
Abstract: International Private Law (PIL) is a complex legal domain that presents frequent conflicting norms between the hierarchy of legal sources, legal domains, and the adopted procedures. Scientific research on PIL reveals the need to create a bridge between European and national laws. In this context, legal experts have to access heterogeneous sources, being able to recall all the norms and to combine them using case-laws and following the principles of interpretation theory. This clearly poses a daunting challenge to humans, whenever Regulations change frequently or are big-enough in size. Automated reasoning over legal texts is not a trivial task, because legal language is very specific and in many ways different from a commonly used natural language. When applying state-of-the-art language models to legalese understanding, one of the challenges is always to figure how to optimally use the available amount of data. This makes hard to apply state-of-the-art sub-symbolic question answering algorithms on legislative texts, especially the PIL ones, because of data scarcity. In this paper we try to expand previous works on legal question answering, publishing a larger and more curated dataset for the evaluation of automated question answering on PIL.