UNSIGNED JUDICIAL OPINIONS

This Article proposes a novel and provocative analysis of judicial opinions that are published without indicating individual authorship. Our approach provides an unbiased, quantitative, and computer scientific answer to a problem that has long plagued legal commentators. * William Li is a PhD student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and a 2012 graduate of the Technology and Policy Program at the Massachusetts Institute of Technology (MIT). * Pablo Azar is a PhD student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT). * David Larochelle is an engineer at the Berkman Center for Internet & Society at Harvard University. * Phil Hill is a Fellow at the Berkman Center for Internet & Society at Harvard University and a 2013 J.D. Candidate at Harvard Law School. * James Cox was an associate with Jenner & Block LLP during drafting of this Article, and currently serves as an attorney for the United States government. * Robert C. Berwick is Professor of Computational Linguistics and Computer Science and Engineering in the Departments of Electrical Engineering and Computer Science and Brain and Cognitive Sciences, MIT. * Andrew W. Lo is the Charles E. and Susan T. Harris Professor at the MIT Sloan School of Management, Principal Investigator in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT), and a joint faculty in the MIT Electrical Engineering and Computer Science Department. † We thank John Cox at MIT, Andy Sellars and Ryan Budish at the Berkman Center, and Philip C. Berwick at the Washington University in St. Louis Law School for their invaluable feedback, and Jayna Cummings for editorial assistance. 504 STANFORD TECHNOLOGY LAW REVIEW [Vol. 16:485 United States courts publish a shocking number of judicial opinions without divulging the author. Per curiam opinions, as traditionally and popularly conceived, are a means of quickly deciding uncontroversial cases in which all judges or justices are in agreement. Today, however, unattributed per curiam opinions often dispose of highly controversial issues, frequently over significant disagreement within the court. Obscuring authorship removes the sense of accountability for each decision’s outcome and the reasoning that led to it. Anonymity also makes it more difficult for scholars, historians, practitioners, political commentators, and—in the thirty-nine states with elected judges and justices—the electorate, to glean valuable information about legal decisionmakers and the way they make their decisions. The value of determining authorship for unsigned opinions has long been recognized but, until now, the methods of doing so have been cumbersome, imprecise, and altogether unsatisfactory. Our work uses natural language processing to predict authorship of judicial opinions that are unsigned or whose attribution is disputed. Using a dataset of Supreme Court opinions with known authorship, we identify key words and phrases that can, to a high degree of accuracy, predict authorship. Thus, our method makes accessible an important class of cases heretofore inaccessible. For illustrative purposes, we explain our process as applied to the Obamacare decision, in which the authorship of a joint dissent was subject to significant popular speculation. We conclude with a chart predicting the author of every unsigned per curiam opinion during the Roberts Court. INTRODUCTION....................................................................................................... 505 I. UNSIGNED OPINIONS ........................................................................................ 505 A. Historical Context of Unsigned Opinions .................................................. 506 B. Problems with Unsigned Opinions ............................................................. 508 C. Solving Attributional Questions the Old-Fashioned Way........................... 509 D. Solving Attributional Questions Algorithmically........................................ 510 II. TEST CASE: OBAMACARE................................................................................... 511 III. EXPERIMENTAL SETUP ..................................................................................... 514 A. Experimental Questions ............................................................................. 514 B. Data Preparation ....................................................................................... 515 C. Machine Learning System Overview .......................................................... 515 D. Design of Authorship Attribution System ................................................... 516 1. Document Representation ................................................................... 517 2. Model Selection ................................................................................... 518 3. Feature Selection ................................................................................ 520 IV. EMPIRICAL RESULTS AND DISCUSSION ............................................................. 522 A. Feature Sets and Classification Models ..................................................... 522 B. Comparison of Feature Selection Models .................................................. 522 C. Interpreting Authorship Attribution Model Scores ..................................... 523 D. Insights on Writing Styles ........................................................................... 524 E. Controlling for Clerks ................................................................................ 525 F. Authorship Prediction for Sebelius ............................................................ 526 G. Comparison to Predictions by Domain Experts ......................................... 527 H. Section-by-Section Analysis ....................................................................... 528 V. AUTHORSHIP PREDICTIONS FOR PER CURIAM OPINIONS OF THE ROBERTS COURT .............................................................................................................. 529 CONCLUSION .......................................................................................................... 533 Spring 2013] ALGORITHMIC ATTRIBUTION 505