Smywiński-Pohl, Aleksander; Piech, Mateusz; Kaleta, Zbigniew; Wróbel, Krzysztof
Abstract: The article discusses the problem of automatic detection and structuring of amendments found in the Polish statutory law. We treat the problem as a token-classification task and we introduce a token classification scheme constructed by analysis of more than 200 amending bills. We apply recent neural architectures such as BERT and BiRNN to the task of token classification. The achieved results are very high and the best model (BiRNN) achieves 98.2% micro average F1 score. The results for the remaining models are not much worse (the lowest score is still above 96%). Besides the experiments conducted for a number of neural models, we also introduce and algorithm devised for the conversion of the classified tokens into fully-structured amendments. The algorithm is simple, hence can be easily implemented. The conversion algorithm achieves almost 97\% coverage. The presented solution shows that automatic amendment extraction is feasible thanks to the recent developments in Natural Language Processing.