Corpus linguistics: Empiricism and frequency

This is the second in a series of posts about the essentially final version of Carissa Hessick’s article Corpus Linguistics and the Criminal Law. The first post dealt mainly with Hessick’s views about how corpus linguistics relates to ultimate purpose of legal interpretation, which is to determine the legal meaning of the text in dispute. This time around, I’ll be discussing her claim that incorporating corpus linguistics into legal interpretation would radically transform the process of determining the text’s ordinary meaning:

Corpus linguistics reframes the “plain” or “ordinary” meaning inquiry in two ways. First, it claims that ordinary meaning is an empirical question. Second, it tells us that this empirical question ought to be answered by how frequently a term is used in a particular way. Both of these analytical moves represent significant departures from current theories of statutory interpretation, including textualism, and they render statutory interpretation essentially unrecognizable.

This statement is a mixed bag. In one respect, it’s correct. Those who support the use of corpus linguistics in legal interpretation do regard ordinary meaning as an empirical question—or at least as involving empirical questions. In a different respect, it is partly correct but oversimplified. Analysis of frequency data is in fact central to corpus linguistics, but it is not necessarily decisive, and in some cases (perhaps in many cases) it will not be helpful at all. And in a  third respect, Hessick’s statement is wrong. Neither the empiricism of corpus linguistics nor the attention it pays to frequency represents a “significant departure” from existing interpretive theories.


To a large extent, the process of interpretation is empirically based.

Actually, I should say that the processes of comprehension and interpretation are both empirically based. Although both of these processes are often referred to under the rubric of interpretation, I’ve previously explained that I prefer to refer to them by different words. I use comprehension to refer to what I’ve described as the effortless and auto­matic process by which lis­ten­ers and readers arrive at an under­standing of an utterance or a text, for all intents and purposes simultaneously with hearing or reading it. I use the word interpretation, on the other hand,  to mean the deliberative process by which a decisionmaker consciously decides what meaning should be imputed to the utter­ance or text.

Of the two processes, comprehension is the more basic one—its output provides the object upon which the process of interpretation operates—so that’s where we’ll start.

When I say that comprehension is empirically-based, what I mean is that it draws on the comprehender’s knowledge of the relevant language and their ability to use and understand it—knowledge and ability that are product of the comprehender’s experience with the language. There is, or course, some innate cognitive machinery that enables that experience to give rise to the ability to use and understand the language, but whatever that machinery might consist of (and that is a matter of substantial controversy), it does not extend to the specific linguistic ingredients that go into a specific language, such as English or Catalan or Hindi.

For the most part, the process of comprehension goes on below the level of conscious awareness. When we hear an utterance or read a text, we typically understand it immediately, with no perceptible delay and no sensation of mental wheels turning as the utterance or text is processed. On occasion, though, the process becomes a focus of attention, such as when the input is hard to understand. And it becomes a focus of attention for lawyers and judges when a question arises about what a particular chunk of text means. At that point, the process of interpretation begins, as the lawyer or judge starts thinking consciously about what the language in dispute means.

Although I treat interpretation as separate from comprehension, the two processes are connected in at least two ways. First, the understanding of the text that emerges from the comprehension process provides the raw material upon which the interpretation process operates. The lawyer or judge’s own understanding of the disputed word or phrase will inevitably play a role in interpreting it. And perhaps more importantly, the interpretation process will have to take account of the linguistic context (more specifically, the co-text) in which the disputed expression appears, and that process draws on comprehension.

Second, in trying to arrive at the text’s ordinary meaning, the lawyer or judge will consciously direct their attention to the body of acquired linguistic knowledge that makes comprehension possible. That process is often described (especially in discussions of corpus linguistics) as relying on linguistic intuitions. But such intuitions don’t just appear out of thin air. They arise out of the kind of linguistic experience I’ve been talking about, and that makes them empirically-based.

As an example of this kind of analysis, consider Justice Scalia’s famous dissent from the holding in Smith v. United States that trading a gun for drugs violates a statute that prohibits “using” a firearm in connection with a drug crime:

To use an instrumentality ordinarily means to use it for its intended purpose. When someone asks, “Do you use a cane?,” he is not inquiring whether you have your grandfather’s silver-handled walking stick on display in the hall; he wants to know whether you walk with a cane. Similarly, to speak of “using a firearm” is to speak of using it for its distinctive purpose, i.e., as a weapon. To be sure, “one can use a firearm in a number of ways,” including as an article of exchange, just as one can “use” a cane as a hall decoration—but that is not the ordinary meaning of “using” the one or the other.

The Court does not appear to grasp the distinction between how a word can be used and how it ordinarily is used. It would, indeed, be “both reasonable and normal to say that petitioner ‘used’ his MAC–10 in his drug trafficking offense by trading it for cocaine.” Ibid. It would also be reasonable and normal to say that he “used” it to scratch his head. When one wishes to describe the action of employing the instrument of a firearm for such unusual purposes, “use” is assuredly a verb one could select. But that says nothing about whether the ordinary meaning of the phrase “uses a firearm” embraces such extraordinary employments. It is unquestionably not reasonable and normal, I think, to say simply “do not use firearms” when one means to prohibit selling or scratching with them.

Scalia’s linguistic judgments here are based on his feel for the English language, which he developed as a result of a lifetime of experience speaking and understanding English.

Later, in Watson v. United States, the Court’s opinion (by Justice Souter this time) used a similar mode of analysis in holding that receiving a gun in exchange for drugs (the mirror-image of the transaction in Smith) does not constitute using the gun:

The Government may say that a person “uses” a firearm simply by receiving it in a barter transaction, but no one else would. A boy who trades an apple to get a granola bar is sensibly said to use the apple, but one would never guess which way this commerce actually flowed from hearing that the boy used the granola.

As with Scalia’s dissent in Smith, the linguistic judgments here are based solely on an intuition about how people would be expected to understand the statutory language at issue.

This style of argument is nothing new. An example of it appears in the report of an English case from 1677, Bell v. Knight. A tax had been imposed on “every fire-hearth and stove in every house,” but the taxing statute provided “that this Act shall not extend to charge any blowing-house, stamp, furnace, or kiln, &c.” (According to the OED, a blowing-house was  “a tin-smelting house.”) The question before the court was whether the tax applied to smith’s forges, and counsel for the tax-collector argued that “if a traveler should enquire for a blowing-house, nobody would send him to a smith’s forge.” This argument, too, is empirically-based, in that it grew out of the lawyer’s experience in using the term blowing house and in hearing it used—experience that he must have assumed was shared by the judges who heard the case.

By the same token, Blackstone was being an empiricist when he wrote, “Words are to be understood in their usual and most known signification; not so much regarding the propriety of grammar as their general and popular use.” Because although he was invoking a simplistic conception of word meaning, which seemingly regarded words as having only a single “usual and most known signification,” identifying such a meaning is inherently an empirical task.

More recently, Reed Dickerson wrote in The Interpretation and Application of Statutes (1975), “True meaning…is always a matter of fact and it is to be found, not by asking people how they believe they use language, but by observing how they actually use it.” And the Supreme Court made a similar point in Watson: in the absence of a statutory definition or other clear indication, the meaning of the language at issue “has to turn on the language as we normally speak it.”


A good point of entry into the issue of frequency is provided by the Blackstone quote I discussed above. “Words,” he said, “are to be understood in their usual and most known signification.” It seems to me that under any reasonable approach to determining a word’s usual and most known meaning, the frequency with which the word’s various senses are believed to appear will inevitably be a factor. If that is correct, then frequency has been at least implicitly relevant to legal interpretation for 350 years. And an echo of Blackstone’s phrase “usual and most signification” can be heard in Watson’s reference to “the language as we normally speak it.”

There are also other indications in the case law that the frequency of a word’s various senses is relevant to the question of the word’s ordinary meaning (keeping in mind that I’m talking about the frequency of the word’s senses when it is used in the relevant kind of context, and the ordinary meaning of the word in that kind of context). For example, courts often equate a word’s ordinary meaning with its most common meaning, or link the two concepts together:

I agree with the majority that the term “another” should be given its ordinary meaning and that the most common usage of “another” as a pronoun is “an additional one” or “one more.”

United States v. Brumley (5th Cir. 1996) (dissenting op.) [link].

The term “exchange,” in its most common, ordinary meaning implies an act of giving one thing in return for another thing regarded as an equivalent.

Yarbro v. C.I.R. (5th Cir. 1984) [link].

Such an interpretation comports well with the statute’s purposes in addition to following the most common and ordinary meaning of its language.

United States v. 122,942 Shares of Common Stock (N.D. Ill. 1994) [link].

In other cases, courts focus on what the word at issue “ordinarily” means or how it is “ordinarily” or “usually” used or understood, often equating that with the word’s “ordinary meaning”:

Because the TVPA does not define the term “individual,” we look first to the word’s ordinary meaning…. As a noun, “individual” ordinarily means “[a] human being, a person.” 7 Oxford English Dictionary 880 (2d ed. 1989)…. After all, that is how we use the word in everyday parlance. We say “the individual went to the store,” “the individual left the room,” and “the individual took the car,” each time referring unmistakably to a natural person. And no one, we hazard to guess, refers in normal parlance to an organization as an “individual.” Evidencing that common usage, this Court routinely uses “individual” to denote a natural person, and in particular to distinguish between a natural person and a corporation.

Mohamad v. Palestinian Authority (U.S. 2012) [link].

I find the definitions HomeAway offers accurately convey the ordinary meaning of “proprietor,” especially when taken in the hotel context. One would not ordinarily understand “proprietor of a hotel” to include people who simply have control over the hotel, use the hotel, or have the right to enjoy the use and advantages of a hotel. Instead, one would understand a proprietor of a hotel to be the owner of the hotel—ownership is a central element of proprietorship.

City of Portland v., Inc. (D. Or. 2016) [link].

Since [the word “discharge”] is neither defined in the statute nor a term of art, we are left to construe it in accordance with its ordinary or natural meaning. When it applies to water, “discharge” commonly means a “flowing or issuing out,” Webster’s New International Dictionary 742 (2d ed.1954), and this ordinary sense has consistently been the meaning intended when this Court has used the term in prior water cases.

S.D. Warren Co. v. Maine Bd. of Environmental Protection (U.S. 2006) [link].

The first rule in considering the meaning and effect of a statute is to construe it just as it reads, giving the words their ordinary meaning and usually accepted meaning in common language.

Johnson v. Bonds Fertilizer, Inc. (Ark. 2006) [link].

Unambiguous language may not be interpreted to contradict its plain meaning. A corollary of this rule is that a statutory term should be interpreted and applied according to its usually accepted meaning, where the ordinary meaning of the term results in an application that is neither unreasonably confused, inoperable, nor in blatant contradiction of the express purpose of the statute.

… In concluding that the definition of full-time service sets a boundary which defines overtime and that any hours worked in excess of forty per week constitute overtime, the court of appeals ignored the usually accepted meaning of the term.

O’Keefe v. Utah State Retirement Board (Utah 1998) [link].

As Appellant notes, the literal language of Section 610.122(2) states simply that no charges “will be pursued” as a result of the arrest; and the plain and ordinary meaning of those words is ordinarily understood to refer to the future tense only.

Martinez v. State (Mo. Ct. App. 2000) [link].

To say that a word is “ordinarily” or “usually” or “commonly” used or understood in a certain way is necessarily to say that it is frequently used or understood in that way. That being the case, it seems to me to be entirely appropriate to consider data about the frequency with which the competing senses of the word at issue occur when the word is used in the relevant type of context. (For a short explanation of what I mean by “the relevant type of context,” see Meaning in the framework of corpus linguistics, and for more detail, see this paper.)

So although corpus linguistics provides a new methodology for analyzing issues of ordinary meaning, it doesn’t represent a “significant departure from current theories of statutory interpretation,” much less “render statutory interpretation essentially unrecognizable.”


Some of quotations in this post have been reformatted and/or cleaned up.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s