Compensation Disclosure: A Study Based on Semantic Similarity

How companies determine executive compensation plays a critical role in corporate governance by helping recruit, motivate, and retain key employees. Not surprisingly, it has attracted broad attention from academics, investors, regulators, and the general public. In a recent study, we introduce a toolset for extracting relevant information from narrative disclosures and seek to provide novel insights into the compensation-setting process and facilitate further research on firms’ compensation decisions.

Research on executive compensation has largely focused on quantitative measures such as total pay level, cash bonus, and equity incentives. Given the wealth of other information that firms disclose about executive compensation, however, such metrics might not be sufficient to explain firms’ compensation practices. In our study, we propose that firms’ narrative compensation disclosures in their proxy statements contain important information.

We adopt state-of-the-art natural language processing (“NLP”) techniques that capture the semantics of disclosures, then assess whether firms that face similar economic factors or influences release similar narrative compensation disclosures. Comparing the content of compensation disclosures is not a trivial task. Disclosures about executive compensation are highly specific and, because they essentially cover only one topic, tend to use very similar language. This specificity of language makes it difficult to identify differential information. Therefore, it is critical to extract the semantics, or underlying information, of the disclosure.

Prior research has typically studied the content of narrative disclosures using “bag-of-word” (“BOW”) approaches. BOW techniques compare words but ignore the dependence between words such as synonyms as well as details such as specificity and conjugation. Due to the limitations of the BOW-based approaches, prior studies have generally focused on comparing disclosures within firms over time, as this allows for firm features such as disclosure style, word choices, jargon, and proper nouns. Executive compensation disclosures are especially vulnerable to the shortcomings of BOW-based approaches given the specificity of these disclosures. Thus, we apply more recent NLP techniques to capture the semantics of disclosure, so that we can more effectively compare the underlying information being disclosed.

The methodology we adopt is generally termed “document embeddings.” This algorithm uses machine learning to identify the underlying relationship between words in the English language. Document embedding essentially “learns” a language and world model from a large set of texts. In our context, we train the algorithm on a set of business disclosures, thus the learning captures both general linguistic concepts (e.g., synonyms and conjugation) as well as business specific concepts and relationships (e.g., accounting jargon and company-product relations). The most important feature for our setting, however, is that the document embedding methods are agnostic to word choice. For example, compensation related to earnings and compensation related to net income would be understood to be similar concepts.

Using our new measure in a large sample of compensation disclosures extracted from proxy statements of U.S. public firms over 1995–2020, we show that firms of similar size have more similar compensation disclosures. After controlling for disclosure length and common effects for all firms in each year, we also find less average similarity and more variation in the semantic information among small firms than among large ones. This is consistent with more compensation disclosure “herding” or “convergence” among the larger firms, potentially due to more public scrutiny or other market forces. In contrast, the proxies using BOW-based approaches generally do not demonstrate these patterns in similarity. Interestingly, we find evidence that the similarity based on the exact-language matching BOW approaches is mechanically driven by document length (longer documents compare more similar using proxies based on BOW approaches), which correlates strongly with firm size.

We also show that firms in the same industry have more similar compensation disclosures, compared with firms in different industries. The semantic similarity increases as our definition of industry becomes finer (using the hierarchical GICS industry codes). Again, in general, we do not observe this relationship when we use proxies based on the traditional BOW-based approaches. The results suggest that our semantic similarity measure can better differentiate the similarity in compensation disclosures across firms from different industries.

We then study the effects of compensation peers and consultants on firms’ disclosure similarity. We find that the focal firm’s compensation disclosures are, on average, much more similar to the disclosures of compensation-peer firms and firms that share one or more compensation consultants. Interestingly, the level of similarity between peer firms’ compensation disclosures drops by approximately half when we account for peers’ being in the same industry. This suggests that some of the similarity in peer disclosures stems from being in the same industry. These results suggest that compensation disclosure decisions are correlated with firms being compensation peers or sharing compensation consultants, both of which are relatively less costly firm decisions, and the effect size of the former is the largest we find across the factors we study.

Overall, our findings provide a more nuanced view of how firms design and justify executive compensation, beyond just the quantitative features. We extend the investigation of economic determinants of compensation to narrative compensation disclosures by demonstrating that our measure of semantic similarity in compensation disclosures increases with firm size, industry similarity, disclosed compensation peers, and common compensation consultants. Our study is the first, to our knowledge, to demonstrate the value and flexibility of sophisticated document embedding methods in the business field. Embedding-based approaches provide richer representation of texts, and hence can play an important role in future research employing textual analysis in social science. Studying public compensation disclosures also allows us to study executive compensation for smaller firms that are often overlooked in the literature, since they are not covered by commercial data providers focusing on larger firms.

This post comes to us from professors Maclean Gaulin and Xiaoxia Peng at the University of Utah’s David Eccles School of Business. It is based on their recent paper, “Compensation Disclosure: A Study via Semantic Similarity,” available here.

The CLS Blue Sky Blog

Columbia Law School's Blog on Corporations and the Capital Markets

Compensation Disclosure: A Study Based on Semantic Similarity