There’s a long article on the front page of the New York Times about the use of computational linguistics and artificial intelligence in legal practice—specifically, in analyzing truckloads of documents during litigation to separate out what’s relevant and identify factual patterns. (“Armies of Expensive Lawyers, Replaced by Cheaper Software“)
At Language Log, Mark Liberman discusses the article and explains that the new e-discovery applications that have proliferated over the past few years are mainly the result of improvements on existing methods rather than anything fundamentally new.
[T]he ideas and algorithms behind this transition have been developed and demonstrated in research projects over the past 25 years or so. There are some recent new ideas, and there will no doubt be a regular progression of other new ideas in the future. But there isn’t any recent development that deserves to be called a breakthrough. Rather, there are three mutually-reinforcing processes that have been under way for decades, and are now starting to make a practical impact in this as well other applications of speech and language engineering:
- A gradual accumulation of new techniques and (especially) refinement of older ones, which yield cumulative improvements in performance;
- Constant cost-performance improvements in computers, networks, and storage, which make it possible to apply (new and old) ideas on larger and larger scales, more and more cheaply;
- Increasingly digitization of communication and record-keeping, which makes larger stores of data available for training, and also makes deployment of automated systems easier and cheaper.
Of course, this technology is being used for more than just legal applications; it’s becoming ubiquitous. It’s only a matter of time (five years?) before there’s a big lawsuit where language technology is what’s at issue. Maybe a patent case, maybe a suit against a software vendor alleging that their product didn’t perform, maybe something else. But it will happen.