McConnell, Devin J.; Zhu, James; Pandya, Sachin S.; Aguiar, Derek Cole
Abstract: Lawyers regularly predict court outcomes to make strategic decisions, including when, if at all, to sue or settle, what to argue, and how to reduce their clients’ liability risk. Yet, lawyer predictions tend to be poorly calibrated and biased, which exacerbate unjustifiable disparities in civil case outcomes. Current ML approaches for predicting court outcomes tend to be based on features unavailable in real-time during litigation. Here, we present the first prospective machine learning methods to support lawyer and client decision making for motion filings in civil proceedings. We demonstrate the utility of incorporating natural language features extracted from complaint documents, which contain information specific to the claims alleged. Using the State of Connecticut Judicial Branch administrative data and court case documents, we train six classifiers to predict motion to strike outcomes in tort and vehicular cases between July 1, 2004 to February 18, 2019. Integrating dense word embeddings and algorithmic classification rules with the administrative data improved classification accuracy across all models. More careful models defined over word2vec features using corpus specific TF-IDF weightings and algorithmic classification rules yielded the highest classification accuracies with Adaboost giving a classification accuracy of 64.4%. Subsequent analysis of feature importance weights confirmed the usefulness of incorporating natural language features from complain documents. Since all features that were used to train our models are available to lawyers during litigation, these methods will help them make better predictions than they otherwise could, given disparities in lawyer and client resources. All machine learning models, training code, and evaluation scripts are available at https://github.com/randomuserICAIL/motionpredict.