The B.Y.U. Law Review has published its special issue devoted to the papers presented at the 2017 law-and-corpus-linguistics conference hosted by the B.Y.U. Law School.
One of the papers in the volume is mine: “A Lawyer’s Introduction to Meaning in the Framework of Corpus Linguistics” (abstract; pdf), which discusses a new way of thinking about the issue of word meaning that has developed as a result of the use of corpus linguistics in lexicography. A condensed version of that discussion (very condensed) can be found in my post Meaning in the Framework of Corpus Linguistics.
Of the other papers, there are three that I think will be of the most interest to readers (whether judges, lawyers, or legal academics) who want to learn more about what role corpus linguistics can play in legal interpretation. Two of those papers view the use of corpus linguistics positively; the other is critical of it.
Both of the favorable papers are collaborations between a law professor with a PhD in linguistics and a linguistics professor who specializes in corpus linguistics:
This Article discusses how corpus analysis, and similar empirically based methods of language study, can help inform judicial assessments about language meaning. We first briefly outline our view of legal language and interpretation in order to underscore the importance of the ordinary meaning doctrine, and thus the relevance of tools such as corpus analysis, to legal interpretation. Despite the heterogeneity of the judicial interpretive process, and the importance of the specific context relevant to the statute at issue, conventions of meaning that cut across contexts are a necessary aspect of legal interpretation. Because ordinary meaning must in some sense be generalizable across contexts, it would seem to be subject in some way to the empirical verification that corpus analysis can provide. We demonstrate the potential of corpus analysis through the study of two rather infamous cases in which the reviewing courts made various general claims about language meaning. In both cases, United States v. Costello and Smith v. United States, the courts made statements about language that are contradicted by corpus analysis. We also demonstrate the potential of corpus analysis through Hart’s no-vehicles-in-the-park hypothetical. A discussion of how to approach Hart’s hypothetical shows the potential but also the complexities of the kind of linguistic analyses required by such scenarios. Corpus linguistics can yield results that are relevant to legal interpretation, but performing the necessary analyses is complex and requires significant training in order to perform competently. We conclude that while it is doubtful that judges will themselves become proficient at corpus linguistics, they should be receptive to the expert testimony of corpus linguists in appropriate circumstances.
In this paper, we set out to explore conditions in which the use of large linguistic corpora can be optimally employed by judges and others tasked with construing authoritative legal documents. Linguistic corpora, sometimes containing billions of words, are a source of information about the distribution of language usage. Thus, corpora and the tools for using them are most likely to assist in addressing legal issues when the law considers the distribution of language usage to be legally relevant. As Thomas R. Lee and Stephen C. Mouritsen have so ably demonstrated in earlier work, corpus analysis is especially helpful when the legal standard for construction is the ordinary meaning of the document’s terms. We argue here that four issues should be addressed before determining that corpus analysis is likely to be maximally convincing. First, the legal issue before the court must be about the distribution of linguistic facts. Second, the court must decide what makes an interpretation “ordinary.” Third, if one wishes to search a corpus to glean the ordinary meaning of a term, one must decide in advance what to search. Fourth, there are different reasons as to why a particular meaning might present a weak showing in a corpus search, and these need to be understood. Each of these issues is described and discussed
The article that is critical of corpus linguistics is “Corpus Linguistics and the Criminal Law,” by Carissa Byrne Hessick (abstract, pdf). Although Hessick’s piece is nominally a response to the Gries & Slocum paper, it is in fact a broad argument against using corpus linguistics in interpreting criminal statutes (and by implication, in statutory interpretation generally):
This brief response to Ordinary Meaning and Corpus Linguistics, an article by Stefan Gries and Brian Slocum, explains why corpus linguistics represents a radical break from current statutory interpretation practice, and it argues that corpus linguistics ought not be adopted as an interpretive theory for criminal laws. Corpus linguistics has superficial appeal because it promises to increase predictability and to decrease the role of judges’ personal preferences in statutory interpretation. But there are reasons to doubt that corpus linguistics can achieve these goals. More importantly, corpus linguistics sacrifices other, more important values, including notice and accountability.
As you’ll know if you’ve been reading this blog over the past six months or so, I disagree with much of what Hessick says. I’ve previously discussed her paper and the issues it raises, both here and in the comments on her posts about the paper on PrawfsBlawg:
Posts by Hessick on PrawfsBlawg
“Corpus linguistics and criminal Law.” September 6, 2017 (link)
“More on corpus linguistics and the criminal law.” September 11, 2017 (link)
“Corpus linguisticx re-redux.” September 25, 2017 (link)
My posts here
“Some comments on Hessick on corpus linguistics.” September 13, 2017 (link).
“Meaning in the framework of corpus linguistics.” September 21, 2017 (link).
“More on the relevance of frequency data: Responding to Steinberg.” January 2, 2018 (link).
“Responding further to Hessick on corpus linguistics (The first in a series).” January 19, 2018 (link).
“Corpus linguistics: Empiricism and frequency.” March 22, 2018 (link).
“Corpus linguistics and empiricism: A Twitter exchange.” March 24, 2018 (link).
“‘Empirical’ doesn’t necessarily mean ‘definitively verifiable.’” April 2, 2018 (link).
Several of the papers in the special issue are related in one way or another to originalism. Larry Solum contributes “Triangulating Public Meaning: Corpus Linguistics, Immersion, and the Constitutional Record” (pdf, abstract), which seeks to situate corpus linguistics within a broader approach to originalist methodology:
This Article contributes to the development of an originalist methodology by making the case for an approach that employs three distinct methods, each of which serves as a basis for confirming or questioning the results reached by the other two. This approach will be called the Method of Triangulation. The three component techniques are as follows: 1. The Method of Corpus Linguistics: The method of corpus linguistics employs large-scale data sets (corpora) that provide evidence of linguistic practice. 2. The Originalist Method of Immersion: The method of immersion requires researchers to immerse themselves in the linguistic and conceptual world of the authors and readers of the constitutional provision being studied. 3. The Method of Studying the Constitutional Record: The method of studying the record framing, ratification, and implementation requires the researcher to examine the drafting process, including sources upon which the drafters relied, debates during the drafting and ratification process, and the early history of implementation of the constitutional provision. These three methods each provide different inputs into the process of constitutional interpretation and construction. Because each method can be checked against the others, the combination of the three methods results in what can be called “triangulation.”
A response to Solum’s paper is offered by Jake Linford in “Datamining the Meaning(s) of Progress” (abstract, pdf). Linford’s paper “agrees in large part with Professor Solum’s prescription and focuses on how to best use corpus lexicography to confirm or refute other evidence of original public meaning.” As a case study, he focuses on “how corpus lexicography might build on prior scholarly work which analyzes the language of Article I, Section 8, Clause 8 of the Constitution (“the Copyright and Patent Clause”).” And he deals with a potential problem that is noted by Solum: the fact that “corpus lexicography might often fail to recognize an attempt to use an existing word in a new way to create meaning.” In particular, Linford’s paper “highlights one potential approach to corpus construction that might ameliorate this limitation by treating the drafting and ratification of the Constitution as an inflection point from which we might measure semantic shift—the creation of new meaning.”
Lee Strang, too, offers a case study in order to test “corpus linguistics’ capacity to increase originalism’s methodological accuracy.” His paper is titled “The Original Meaning of ‘religion’ in the First Amendment: A Test Case of Originalism’s Utilization of Corpus Linguistics” (abstract, pdf). He describes the paper as accomplishing three goals:
First, it provides a practical example of the application of corpus linguistics to originalism. This affords a first-cut illustration of the extent to which corpus linguistics can make originalism’s methodology more rigorous. Second, this Essay utilizes the tools of corpus linguistics to provide additional evidence of the original meaning of “religion” in the First Amendment. Third, based on this experience, it describes some of the challenges originalist scholars will likely face employing corpus linguistics.
Jennifer Mascott’s paper, “The Dictionary as a Specialized Corpus” (abstract, pdf) expands on one aspect of her research into the original meaning of the Appointments Clause of the Constitution, currently before the Supreme Court in a case that I’ve previously talked about:
Scholars consider reliance on dictionary definitions to be the antithesis of objective, big-data analysis of ordinary meaning. This Article contests that notion, arguing that when dictionaries are treated as a specialized database, or corpus, they provide invaluable textured understanding of a term. Words appear in dictionaries both as terms being defined and as terms defining other words. Examination of every reference to a contested term throughout a dictionary’s definitional entries of other words may substantially benefit statutory and constitutional interpretation. Because dictionaries catalog language, their use as a specialized corpus provides invaluable insight into the ways a particular word is used in relation to terms throughout the English language. Such evidence provides a crucial interpretive launchpad, even for corpus-based researchers looking for a collection of possible word meanings to analyze in a database of ordinary-language documents.
Of the remaining papers, one deals with questions of corpus design and analytical methodology: “Advancing Law and Corpus Linguistics: Importing Principles and Practices from Survey and Content Analysis Methodologies to Improve Corpus Design and Analysis,” by James C. Phillips and Jesse Egbert (abstract, pdf; see also the commentary by Ed Finegan). The issue of corpus design is important, but it is probably something that will be of interest only to people who have a fair amount of corpus-linguistic expertise. The issue of analytical methodology will be of more widespread interest.
One point that I want to note about the latter issue is Phillips and Egbert’s recommendation that those undertaking corpus analyses use multiple coders to categorize the search results and that the coders work independently of each other and be kept ignorant of the purposes of the study. The purpose of those recommendations is to try to maximize the objectivity and reliability of the analysis. That’s a goal that obviously makes sense in the academic context, but what about situations in which corpus analysis is conducted as part of a litigant’s legal argument?
On the one hand, maybe the kinds of precautions discussed by Phillips and Egbert are needed even more in the context of litigation; the incentive to put a thumb on the scale is probably stronger in that context, or at least more obvious, than in academia. But on the other hand, nobody expects objectivity in legal briefs (which isn’t to say that it’s never found there). And more importantly, any analysis provided by one of the parties will be subject to review and challenge by the opposing party, who will be strongly motivated to poke holes in the analysis.
So my initial reaction to this aspect of Phillips and Egbert’s paper is that the measures they recommend won’t be necessary in the context of litigation. (However, note that I’m talking about the use of corpus linguistics as a fully integrated part of the lawyer’s legal analysis, not about the presentation of corpus analysis through an expert witness, which raises different considerations.)
In any event, the issues that Phillis and Egbert raise will need to be discussed, with an eye toward trying to develop a consensus about what practices should be followed.
The last two papers (one of which is a commentary on the other) approach corpus linguistics from a perspective very different from that of using corpus linguistics as a tool in legal interpretation. The latter, which is what LAWnLinguistics focuses on, involves incorporating corpus linguistics into legal analysis. The perspective of the last two papers, on the other hand, is purely academic, and rather than proposing the use of corpus linguistics in litigating, deciding, or analyzing the content of the law, their aim (as I understand it) is primarily to use corpus linguistics to study legal language and the use of language by actors in the legal system.
Hanjo Hamann & Friedemann Vogel, Evidence-Based Jurisprudence Meets Legal Linguistics — Unlikely Blends Made in Germany (pdf)
German legal thinking is renowned for its hair-splittingly sophisticated dogmatism. Yet, some of its other contributions to research are frequently overlooked, both at home and abroad. Two such secondary streams recently coalesced into a new corpus-based research approach to legal practice: Empirical legal research (which had already developed in Germany by 1913) and research on language and law (following German pragmatist philosopher Ludwig Wittgenstein’s work of 1953). This Article introduces both research traditions in their current German incarnations (Evidence-Based Jurisprudence and Legal Linguistics) and shows how three common features—their pragmatist observation of social practices, their interest in dissecting legal authority, and their big data strategy—inspired a new, corpus-based research agenda, Computer Assisted Legal Linguistics (CAL²).
By offering an international and interdisciplinary point of comparison, Hamann and Vogel demonstrate that current American forays into corpus-based legal scholarship reflect only a small sliver of the full range of possibilities for such research. This Comment considers several key branching points that may lie ahead, as the nascent literature begins to mature. In particular, the Comment examines two vexing ambiguities in the corpus-linguistic agenda: the first centers on the ambiguous meaning of legal “empiricism”; the second, on the ambiguous relationship between words and actions. To achieve its full potential, legal corpus linguistics will need to move beyond mere description, to identify patterned configurations, to interpret cultural meanings, and to trace causal processes. To do so effectively, researchers will need to look beyond legal corpora alone, to explore the varied and complex relationships between texts and acts, and between legal institutions and the surrounding society.