The coming corpus-based reexamination of the Second Amendment

It was only three weeks ago that BYU Law School made available two corpora that are intended to provide corpus-linguistic resources for researching the original meaning of the U.S. Constitution. And already the corpora are yielding results that could be very important.

The two corpora are COFEA (the Corpus of Founding Era American English) and COEME (the Corpus of Early Modern English). As I’ve previously explained, COFEA consists of almost 139 million words, drawn from more than 95,000 texts from the period 1760–1799, and COEME consists of 1.28 billion words, from 40,000 texts dating to the period 1475–1800. (The two corpora can be accessed here.)

Within a day after COFEA and COEME became available, Dennis Baron looked at data from the two corpora, to see what they revealed about the meaning of the key phrase in the Second Amendment: keep and bear arms. (Baron was one of the signatories to the linguists’ amicus brief in District of Columbia v. Heller.) He announced his findings here on Language Log, in a comment on my post about the corpora’s unveiling:

Sorry, J. Scalia, you got it wrong in Heller. I just ran “bear arms” through BYU’s EMne [=Early Modern English] and Founding Era American English corpora, and of about 1500 matches (not counting the duplicates), all but a handful are clearly military.

Two weeks later, Baron published an opinion piece in the Washington Post, titled “Antonin Scalia was wrong about the meaning of ‘bear arms’,” in which he repeated the point he had made in his comment, and elaborated on it a little. Out of “about 1,500 separate occurrences of ‘bear arms’ in the 17th and 18th centuries,” he said, “only a handful don’t refer to war, soldiering or organized, armed action.” Based on that fact, Baron said that the two corpora “confirm that the natural meaning of ‘bear arms’ in the framers’ day was military.”

My interest having been piqued, I decided to check out the corpus data myself.

I’ve now downloaded from COFEA what I think is the most directly relevant data, including what I think are all instances of the constructions bear arms, bears arms, bearing arms, bore arms, and born(e) arms. I’ve made an initial coding pass through that data, and also taken a quick look at some additional data that I downloaded.

Having done that, while I’m not ready to go as far as Baron does, I do think that what I’ve seen provides a substantial basis for challenging Heller‘s interpretation of keep and bear arms.

However, the purpose of this post isn’t to talk about what the data shows; I’ll reserve that for a series of posts over the next several weeks. Rather, my purpose here is to prime the pump for the discussion that I think the corpus data is going to generate.

I want to encourage people who are interested in the issue to start looking at the corpus data for themselves. To begin with, anyone who wants to intelligently participate in the discussion will have to get their hands dirty with the data.

And beyond that, I think that this can be a good way to introduce lawyers, law professors, and (hopefully) judges to some of the ways in which corpus analysis can be used in legal interpretation. The data will be discussed in the posts that I will be doing, and presumably in publications and posts by others. Those discussions will provide demonstrations of corpus analysis in action, and readers having the data will be abot to refer to it in following the analysis, like opera-goers following along in the libretto.

In addition to wanting to encourage people to engage with the data, I want to help enable them to do so. To that end, I will be posting links to downloadable spreadsheets that will include much of the data that I will be analyzing. You will find the first such link below, and I will provide links to additional data periodically.

Posting the data this way will make it easily accessible to people who’ve had no prior experience with corpora, or who’d simply rather not do the work needed to download the data themselves. It might also might make it possible for there to be a single agreed-upon data set for people to work from, which would facilitate discussion and debate. (I realize that if the data I post is going to serve that purpose, I’ll have to be open to reasonable suggestions about additional data to include.)

Because the search results may include duplicates and near-duplicates, the data I post will be deduped before posting. And to provide visibility into the deduping process, each spreadsheet will include not only the deduped data, but also the raw data and the duplicates that were extracted (along with warnings not to use either of those for analysis.)

With that background out of the way, you can access the first batch of data here. It consists of

  • all instances in COFEA of arms appearing within one word to the right of any form of the verb bear (bears, bearing, etc.) (388 corpus lines after deduping and other cleanup, as explained in the spreadsheet), and
  • All instances in COFEA of arms appearing within three words to the right of any form of the verb keep (keeps, keeping, etc.) (57 corpus lines after deduping and other cleanup, as explained in the spreadsheet).
  • 1,000 instances in COFEA of arms, with no collocates specified (993 after deduping), out of a total of 24,236.

There were no instances in COFEA of the phrase “keep and bear arms,” except for instances that are found in what appears to be the text of the Second Amendment. For obvious reasons, those instances shouldn’t be included in the analysis of the data.


