A New and Improved Corpus of Definitive M&A Agreements for Public Access

The mature field of contract design dates back nearly a century, and it now features myriad rich and varied contributions seeking to characterize or test theories of how parties organize private law to shape and enhance their economic environments. In turn, this literature has spawned legions of intellectual descendants. They include efforts to explore the inherent incompleteness that transaction costs impose on contract structures, the critical importance of governance and control of economic exchange within the contractual boundaries of the firm, the default rules of contract law, and the role of extra-legal norms and enforcement mechanisms. A related literature has invited creative academic thought about how to best organize the “production” of contract provisions themselves, and the choice between designing new language and repurposing terms from legacy.

Yet a third set of contributions attends the somewhat surprising global properties that ensue when contractual emulation becomes systemic. For example, the rote usage of “boilerplate” provisions may lock in inefficient terms or even obscure the settled meaning of contract language. In a similar vein, boilerplate may become “sticky,” resisting change even when the underlying economics strongly favor it. And, the serial layering on of boilerplate terms from different legacy transactions can give rise to a complex landscape of hidden traps, loopholes, and counter-loopholes, all conspiring to make the enforcement of boilerplate contracts difficult to predict.

Each of the above contributions advances important testable predictions about contract practices, and a large cohort of scholars in both law and finance have increasingly turned their attentions to this task. This empirical project is critically important, not only for adjudicating between existing theories, but also for generating new ones about how, when, and whether contract design helps or hinders allocative and productive efficiency. Moreover, the emergence of generative AI and machine learning to study legal text has raised the stakes of empirical contract testing and measurement even further. Now, empirically minded researchers are free to experiment with teasing out new and creative features of contracts, rather than depending on the (somewhat arbitrary) labels that prior researchers constructed. In short, data analytics have become highly democratized.

But data availability still lags far behind. It remains difficult for empirically-minded contract scholars to access clean, readable corpora of relevant contractual texts. Those who have built such corpora have at times been reluctant to share them. On other occasions, authors report that they are prohibited from doing so by restrictive licensing practices. These constraints have placed a barrier to entry within the field of empirical contracts that makes it difficult to accomplish replication, a core condition for the advancement of the field. One significant domain of high-stakes private contracting is mergers and acquisitions. M&A contracts are among the most lengthy, complicated, and important constructs of modern financial markets, making them worthy of attention. In prior work, three of us introduced a unique corpus of publicly disclosed merger agreements spanning 21 years (2000-2020). (A summary of that work previously appeared in the Blue Sky Blog.) At the time of its publication, this corpus was the largest open-access resource of its kind, and we are unaware of any other contributors who have produced a more comprehensive one since.

That said, data collection and cleaning remain an ongoing, laborious process that is never truly complete. We recently released a substantial augmentation of our prior work. The new study chronicles a far more comprehensive contract corpus for definitive merger agreements (the ”DMA Corpus”). We release this improved data set now, in the hopes that doing so will catalyze and further advance much-needed empirical research on contractual design and evolution. The DMA Corpus increases the size of our earlier study from 2,141 to approximately 7,931 definitive merger agreements signed between 2000 and 2020. Created from publicly available SEC filings, it includes the complete text of the agreements and also extracts from key provisions, identified through machine learning techniques, within each contract. Parsing agreements in this fashion allows researchers to study dynamic patterns of both the entire agreements and individual terms within a larger agreement, further enhancing the utility of the MMA Corpus for a wide variety of uses. We are able to increase its size by relaxing a prior requirement from our earlier study that each deal tracked must appear in both FactSet and in the SDC Platinum Financial Securities Data to be included. This constraint significantly limited the size of our prior endeavor, principally due to the sporadic and inconsistent tracking of SDC Platinum.

In our new paper, we implement a different process that combines both manual labels and machine learning to identify reliably the relevant contracts, a process that can easily be extended to other legal domains of data collection. The result is a significant improvement on what has been available to researchers. A high-level glance of various corpus measures documents the breadth and span of improvement over our prior study’s corpus:

In addition to developing the raw corpus, we identify and make available the text of several individual clauses contained in these agreements. In a final step, we provide an illustration of how these data can be used to generate novel insights into M&A contract design and drafting practices.

In the spirit of academic exchange and open access, we are making our data available on a public website through links available in the draft manuscript. We encourage interested researchers, practitioners, students, judges, and other scholars of contract design to use them and to suggest or contribute improvements to the data resource. For more details about access, and an overview of the database, please consult our paper, which provides further description of the DMA Corpus’ salient characteristics, as well as a comparison of trends in this larger data set against those we reported from our prior study.

This post comes to us from Peter Adelson at Stanford University, Matthew Jennejohn at Brigham Young University’s J. Reuben Clark Law School, Julian Nyarko at Stanford Law School, and Eric Talley at Columbia University School of Law. It is based on their recent article, “Introducing a New Corpus of Definitive M&A Agreements, 2000-2020,” available here.

The CLS Blue Sky Blog

Columbia Law School's Blog on Corporations and the Capital Markets

A New and Improved Corpus of Definitive M&A Agreements for Public Access