Category Archives: Corpus linguistics & lexicography

The coming corpus-based reexamination of the Second Amendment

[An introduction and guide to my series of posts “Corpora and the Second Amendment” is available here.]

It was only three weeks ago that BYU Law School made available two corpora that are intended to provide corpus-linguistic resources for researching the original meaning of the U.S. Constitution. And already the corpora are yielding results that could be very important.

The two corpora are COFEA (the Corpus of Founding Era American English) and COEME (the Corpus of Early Modern English). As I’ve previously explained, COFEA consists of almost 139 million words, drawn from more than 95,000 texts from the period 1760–1799, and COEME consists of 1.28 billion words, from 40,000 texts dating to the period 1475–1800. (The two corpora can be accessed here.)

Within a day after COFEA and COEME became available, Dennis Baron looked at data from the two corpora, to see what they revealed about the meaning of the key phrase in the Second Amendment: keep and bear arms. (Baron was one of the signatories to the linguists’ amicus brief in District of Columbia v. Heller.) He announced his findings here on Language Log, in a comment on my post about the corpora’s unveiling:

Sorry, J. Scalia, you got it wrong in Heller. I just ran “bear arms” through BYU’s EMne [=Early Modern English] and Founding Era American English corpora, and of about 1500 matches (not counting the duplicates), all but a handful are clearly military.

Two weeks later, Baron published an opinion piece in the Washington Post, titled “Antonin Scalia was wrong about the meaning of ‘bear arms’,” in which he repeated the point he had made in his comment, and elaborated on it a little. Out of “about 1,500 separate occurrences of ‘bear arms’ in the 17th and 18th centuries,” he said, “only a handful don’t refer to war, soldiering or organized, armed action.” Based on that fact, Baron said that the two corpora “confirm that the natural meaning of ‘bear arms’ in the framers’ day was military.”

My interest having been piqued, I decided to check out the corpus data myself.

Continue reading

Dennis Baron (in WaPo) on corpus linguistics and “bearing arms”

The Washington Post published an opinion piece earlier today by Dennis Baron, with the self-explanatory title “Antonin Scalia was wrong about the meaning of ‘bear arms.’” The crux of the article:

By Scalia’s logic, the natural meaning of “bear arms” is simply to carry a weapon and has nothing to do with armies. He explained in his opinion: “Although [bear arms] implies that the carrying of the weapon is for the purpose of ‘offensive or defensive action,’ it in no way connotes participation in a structured military organization. From our review of founding-era sources, we conclude that this natural meaning was also the meaning that ‘bear arms’ had in the 18th century. In numerous instances, ‘bear arms’ was unambiguously used to refer to the carrying of weapons outside of an organized militia.”

But Scalia was wrong. Two new databases of English writing from the founding era confirm that “bear arms” is a military term. Non-military uses of “bear arms” are not just rare — they’re almost nonexistent.

A search of Brigham Young University’s new online Corpus of Founding Era American English, with more than 95,000 texts and 138 million words, yields 281 instances of the phrase “bear arms.” BYU’s Corpus of Early Modern English, with 40,000 texts and close to 1.3 billion words, shows 1,572 instances of the phrase. Subtracting about 350 duplicate matches, that leaves about 1,500 separate occurrences of “bear arms” in the 17th and 18th centuries, and only a handful don’t refer to war, soldiering or organized, armed action. These databases confirm that the natural meaning of “bear arms” in the framers’ day was military.

The two corpora that Baron used were made available for public use (in beta versions) about two weeks ago; more information about them is available in my post about their public unveiling, “The BYU Law corpora.” Baron (who had joined in the linguistics professors’ amicus brief in Heller) was quick to take advantage of these corpora, and on May 7 he posted this comment on that post (on Language Log):

Sorry, J. Scalia, you got it wrong in Heller. I just ran “bear arms” through BYU’s EMne [=Early Modern English] and Founding Era American English corpora, and of about 1500 matches (not counting the duplicates), all but a handful are clearly military.

Since I thought that this news deserved more attention than it would was likely to get in the comment thread, I did a separate post about it: “‘bear arms’ in the BYU Law corpora.” All of which is to say, you read it here first.

 

[Cross-posted on Language Log.]

 

 

 

 

BYU Law Review: Special issue on law and corpus linguistics

The B.Y.U. Law Review has published its special issue devoted to the papers presented at the 2017 law-and-corpus-linguistics conference hosted by the B.Y.U. Law School.

One of the papers in the volume is mine: “A Lawyer’s Introduction to Meaning in the Framework of Corpus Linguistics” (abstract; pdf), which discusses a new way of thinking about the issue of word meaning that has developed as a result of the use of corpus linguistics in lexicography. A condensed version of that discussion (very condensed) can be found in my post Meaning in the Framework of Corpus Linguistics.

Of the other papers, there are three that I think will be of the most interest to readers (whether judges, lawyers, or legal academics) who want to learn more about what role corpus linguistics can play in legal interpretation. Two of those papers view the use of corpus linguistics positively; the other is critical of it.

Continue reading

“Empirical” doesn’t necessarily mean “definitively verifiable”

Carissa Hessick and I have been debating the appropriateness of using empirical methods in legal interpretation. The debate began on PrawfsBlawg, then moved over here (with some continued discussion at Prawfs), and then spread to Twitter. The relevant tweets are collected in my previous post, and in this post I’ll respond to Hessick’s most recent points.

As I understand her, Hessick contends that the issue of ordinary meaning isn’t an “empirical question” because the question of how a reasonable person would understand the text is inherently qualitative rather than quantitative, and therefore can’t be answered in a way that is “provable or verifiable.” I accept Hessick’s characterization of the ordinary-meaning issue as being qualitative rather than quantitative, but it doesn’t follow that quantitative information is always irrelevant.

Continue reading

Corpus linguistics and empiricism: A Twitter exchange

My last post, Corpus linguistics: Empiricism and frequency, prompted a Twitter exchange between Carissa Hessick and me, a lightly edited version of which I present here.

Hessick:

One question based on my quick read:  Do you think most people would understand “relying on linguistic intuition” to be an empirical undertaking?  I appreciate the insight into how people’s linguistic intuitions are formed.  But don’t most people think that, if something is an empirical question, that means there is a demonstrably correct answer?

And if we often have different intuitions about what a word means (as the split decisions on ordinary meaning illustrate), and if judges resolve the Q of ordinary meaning by consulting their own intuitions, then how can ordinary meaning be an empirical Q? If I have one intuition and you have another, then how to we demonstrate which is correct and which is incorrect?

Me: Continue reading

Corpus linguistics: Empiricism and frequency

This is the second in a series of posts about the essentially final version of Carissa Hessick’s article Corpus Linguistics and the Criminal Law. The first post dealt mainly with Hessick’s views about how corpus linguistics relates to ultimate purpose of legal interpretation, which is to determine the legal meaning of the text in dispute. This time around, I’ll be discussing her claim that incorporating corpus linguistics into legal interpretation would radically transform the process of determining the text’s ordinary meaning:

Corpus linguistics reframes the “plain” or “ordinary” meaning inquiry in two ways. First, it claims that ordinary meaning is an empirical question. Second, it tells us that this empirical question ought to be answered by how frequently a term is used in a particular way. Both of these analytical moves represent significant departures from current theories of statutory interpretation, including textualism, and they render statutory interpretation essentially unrecognizable.

This statement is a mixed bag. In one respect, it’s correct. Those who support the use of corpus linguistics in legal interpretation do regard ordinary meaning as an empirical question—or at least as involving empirical questions. In a different respect, it is partly correct but oversimplified. Analysis of frequency data is in fact central to corpus linguistics, but it is not necessarily decisive, and in some cases (perhaps in many cases) it will not be helpful at all. And in a  third respect, Hessick’s statement is wrong. Neither the empiricism of corpus linguistics nor the attention it pays to frequency represents a “significant departure” from existing interpretive theories.

Empiricism Continue reading

Artis v. District of Columbia, part 2: Units of meaning and dictionary definitions

Sometimes, it’s immediately obvious from the opinions that a case raises questions about interpretation that are interesting, important, or both. Smith v. United States, in which the question was whether trading a handgun for drugs amounts to “using” it, is a classic example. At first glance, the Supreme Court’s decision in Artis v. District of Columbia  doesn’t seem to be in that category. It doesn’t offer interesting linguistic issues that call attention to themselves, except for someone who is familiar with the work of the linguist John Sinclair and the lexicographer Patrick Hanks. But with some digging, Artis yields some issues that I think are  interesting and significant, having to do with new approaches to analyzing questions of word meaning and with how not to use dictionaries.

Continue reading