[Cross-posted at Language Log.]
I’d imagine that most people who’ve been actively involved with corpus linguistics are familiar with the BYU corpora—a collection of web-accessible corpora created by Brigham Young University linguistics professor Mark Davies. These corpora (and BYU’s corpus-linguistics program more generally) have played an essential part in the development of what I’ll call the corpus-linguistic turn in legal interpretation. The BYU corpora served as my entry-point into corpus linguistics, and they have provided the corpus data that has been used in most of the law-and-corpus-linguistics work that has been done to date. And beyond that, the BYU Law School has played an enormous role, in a variety of ways, in Law and Corpus Linguistics becoming a thing.
One of the things that the law school has been doing has been happening largely behind the scenes. For the past two or three years, people there have been developing the Corpus of Founding Era American English (COFEA)—a historical corpus that is intended as resource for studying language usage in the time leading up to the drafting and ratification of the U.S. Constitution. At this year’s conference on law and corpus linguistics (the third such conference, all of them hosted by the BYU Law School), we were given a preview of COFEA. And via a tweet by the law school’s dean, Gordon Smith, I’ve now learned that a beta version of COFEA is up and available for public playing-around-with, as are beta versions of two other corpora: the Corpus of Early Modern English and the Corpus of Supreme Court of the United States.