Using Transformers to Improve Answer Retrieval for Legal Questions

Vold, Andrew; Conrad, Jack G

Abstract: Recent developments in deep learning-based approaches to tasks like open domain question answering have resulted in performance breakthroughs in terms of accuracy. They have also enabled comparable advances in closed domain question answering in fields such as Legal QA. These advances have resulted in performance gains for both factoid and non-factoid question answering. Bi-directional transformers such as BERT have delivered significant performance improvements over baselines for standard open domain question answering tasks using established collections. BERT-based approaches have created opportunities for similar advances in Legal question answering (Legal QA). In this work, we describe the challenges faced when producing a robust RoBERTa model for an operational environment and the QA system development cycle created to address such challenges. We compare a fine-tuned RoBERTa-based deep learning model with a traditional linear SVM using tf.idf based features. We run them against the PrivacyQA question answering collection. We show that with sufficient training and tuning, the RoBERTa-base model outperforms the linear SVM-based approach in terms of F-score by 12\%. We also discuss opportunities for further refinements and improvements.