What’s wrong with ESG ratings?

About a year ago, the Brookings Institution held a panel discussion regarding the role that the SEC should play in ESG investing and invited SEC Commissioner Hester Peirce to speak at the panel. It’s well known, of course, that she is not exactly a fangirl of ESG in any of its manifestations, and she came prepared to engage, armed with a voluminous speech consisting of 10 theses, footnoted to the hilt. One of her theses was that figuring out what “good” means in the context of ESG is very subjective—that’s why, she said, there’s a lot of debate over best ESG practices and that’s especially why ESG ratings firms are so inconsistent in their results. (See this PubCo post.) There may be even more to it than that.  This new paper, ESG ratings—a compass without direction, from the Rock Center for Corporate Governance at Stanford University, looks at ESG ratings and examines issues about their reliability. The authors conclude that, “while ESG ratings providers may convey important insights into the nonfinancial impact of companies, significant shortcomings exist in their objectives, methodologies, and incentives which detract from the informativeness of their assessments.” 

The authors contend that demand for ESG information—from institutional and other investors, companies, regulators and other stakeholders—is so great that it has “outstripped the ability of suppliers to supply the depth, detail, and accuracy of data required. This is perhaps due to the immense number of factors that plausibly fall under the heading of ESG, the difficulty in measuring ESG factors, and the daunting challenge of determining their impact.” The use of ratings information is also impaired by the “lack of comparability across firms, lack of standards, the cost of gathering information, and a lack of quantifiable information.”

To address the need for ESG information, commercial ESG rating services have developed. According to the paper, there are dozens of ratings providers, some of them owned by well-known companies such as ISS, ThomsonReuters, Moody’s and Morningstar. And, it turns out, investment professionals are highly reliant on these services, with surveys showing that up to 88% of investment professionals use third-party ESG ratings as a part of their investment process. The authors cite a bank analysis showing that “over $200 billion was invested in ESG bond funds between 2019 and 2022.”  (According to The Economist, “the titans of investment management claim that more than a third of their assets, or $35trn in total, are monitored through one ESG lens or another.”) 

ESG ratings are supposed to measure “ESG quality.” But is there clear agreement on what that means?  The authors posit two alternative interpretations: under one view, ESG quality assesses “the impact a company has on the welfare of its stakeholders, such as employees, suppliers, customers, local community, and the environment. Under this definition, a company can improve its ESG profile by withdrawing from activities that are harmful to stakeholders or improving business practices in affected areas to benefit these constituents.” Under this alternative, the shareholders bear the brunt of the short-term costs, while, according to the authors, the “long-term financial impact to the company is undetermined or unstated.” (Some might argue—hope—that shareholders tend to benefit in the long run.)  This “doing good” perspective, the authors suggest, “is what most individual investors likely think of when they think about ESG quality.”

The alternative interpretation of ESG quality is that “ESG measures the impact societal and environmental factors have on the company, and that these factors are financially material. Under this definition, an ESG framework provides a set of risk factors that the company can plan for or mitigate through strategic planning, targeted investment, or a change in operating activity. Addressing ESG risk factors, even if costly in the short run, is expected to result in a long-term financial benefit to the corporation and its shareholders. This view of ESG (the impact of environmental and social risks on financial performance) is the one predominantly adopted by ESG ratings providers.”

The authors demonstrate the tension between these two alternatives in quotes from a Bloomberg BusinessWeek article regarding ratings firm MSCI: “‘There’s virtually no connection between MSCI’s “better world” marketing and its methodology. That’s because the ratings don’t measure a company’s impact on the Earth and society. In fact, they gauge the opposite: the potential impact of the world on the company and its shareholders. MSCI doesn’t dispute this characterization. It defends its methodology as the most financially relevant for the companies it rates.’ According to the article, ‘MSCI’s CEO concedes ordinary investors piling into such funds have no idea that his ratings, and ESG overall, gauge the risk the world poses to a company not the other way around. “No, they for sure don’t understand that,” he said in an interview.’” The authors of the paper ask whether retail investors that purchase ESG funds to ensure that “their investments reflect certain societal values or environmental standards” are aware that the “ESG ratings used to create these portfolios do not necessarily attempt to measure a company’s commitment to those values or standards? Should ESG fund managers disclose this?”

The authors note that the SEC “also mixes these views of ESG. The preface to its draft rule on climate-related disclosure states that ‘climate risk can pose significant financial risks to companies, and investors need reliable information about climate risks to make informed investment decisions.’ The preponderance of the draft rule, however, specifies disclosure on the size of corporate emissions, which reflect the company’s impact on the environment rather than the impact of the environment on the company.” (See this PubCo post on the SEC release.)

Although, in general, the ESG ratings firms are geared “to provide insight into ESG quality,” they employ different approaches and articulate different objectives. Many of them aim to reduce investment risk, that is, “reducing social and environmental factors that pose risk to the company’s business model or operations,” thereby ultimately improving “financial performance or reduc[ing the] likelihood of regulatory violations, litigation, or bankruptcy.” Some actually predict improved returns, and some make more nebulous claims to measure “environmental or social impact,” “transparency and commitment to ESG,” or  “provide a screen for ESG selection in support of stewardship goals.”  All of the raters provide letter or numeric scores, some on an absolute basis and others on an industry-relative basis.

To compute these scores, as described by the authors, ratings firms assess the three components of ESG, looking at a variety of subcomponents in each category that they may select independently or base on the frameworks of other organizations, such as SASB. These assessments are then aggregated into an overall score. The data sources used may be public, quasi-public or private data, including company responses to solicited questionnaires. The number of subcomponents assessed varies widely, ranging up to 1,000 metrics for some ratings firms. But, the authors observe, this huge number of variables creates issues by itself, requiring the ratings firms to make a variety of judgments, including materiality assessments and related “weighting” of factors, potential absence of relevant data, standardization of variables (which may be reported by companies differently) to provide comparability across companies, and weighting of “both the variables in their importance to E, S, and G, and also the overall pillars of E, S, and G in relation to one another.”

It’s not hard to see why there are consistency issues across the ratings firms. For example, ratings firms may take different approaches to dealing with missing data. According to the paper, some omit missing data points, while others assume answers based on industry averages or based on the worst case, or based on an estimation “using advanced statistical techniques to impute the missing value.” In addition, companies often report information based on different scales, which the ratings firm then has to standardize: “For example, one company might report workplace safety information using raw numbers (number of incidents), a time scale (injuries per unit of time worked), or a percentage scale (lost-time frequency).”  In addition, the authors maintain that ratings firms seek to improve the performance of their models by making “retroactive adjustments to historical data. For example, the data included in a model five years ago might not be the same as the data in the model today for that same year. Data changes are made to improve the accuracy of models, as new or better data is made available. However, they have the effect of making a model look more predictive than it was. Revising past data based on observed subsequent outcomes can invalidate the results from back testing. This is an important concern when evaluating the predictability and validity of commercial ESG ratings.”

When evaluating ESG ratings for quality, consistency and effectiveness, the paper makes several observations. First, investors seem to have a relatively low opinion of ESG ratings—lack of confidence in methodologies and reliability.  The authors cite a 2020 study of institutional investors that revealed “widespread concerns, including inaccuracy and inconsistency of data, inexperienced research analysts, and a perception that ESG quality cannot be distilled to a score.” The authors also identified systemic patterns in ratings related to company size, industry and country. For example, the authors found that larger companies scored higher on average than smaller companies, which may reflect greater resources invested in ESG or more disclosure of ESG information.  In addition, some industries tended to score higher on average as did more European companies. Based on recent research, the authors also pointed to an upward trend in ratings—an 18% improvement for companies in the Russell 1000 over the period from 2015 through 2021. This improvement was attributable in part to structural changes, such as changes in the index composition, but primarily just to “grade inflation.”

And, no surprise here given the differences in methodology, various studies cited by the authors indicated “low correlations across ESG ratings providers,” even though “ESG ratings are supposed to measure the same construct.” One study showed wide variations not only for the ratings, but also for the individual ESG components. (In one case, comparing assessments by two different rating firms, “assessments of the E, S, and G components…exhibit correlations of only 0.11, 0.18, and -0.02, respectively.) A far cry from the 99% correlation for credit ratings! One study, attempting to understand the reasons for the wide divergence, found “that differences in measurement (56 percent) and scope (38 percent) account for most of the divergence, with weighting differences accounting for just 6 percent of the variance.” Another study found that “corporate disclosure does not reduce the divergence of ESG ratings but instead increases it,” perhaps because of the subjective nature of the information.

What do these differences mean for users? The authors identify three potential consequences of these low correlations: “One is the potential to confuse investment decisions by giving unreliable information about the ESG quality of firms. Another is that it confuses the disclosure that fund managers make to investors about the overall ESG quality of their portfolio. A third is that it reduces the incentive of companies to improve their ESG performance by sending unreliable signals about how their ESG initiatives are assessed by third-party observers.”

Studies cited by the authors also found minimal relationships between ESG ratings and environmental and social outcomes.  For example, an analysis by Bloomberg cited in the paper found that “most upgrades occur for what Bloomberg calls ‘rudimentary business practices’ rather than substantive improvements….Upgrades were often driven by check-the-box practices, such as conducting an employee survey that might reduce turnover, and rarely for substantial practices, such as an actual reduction in carbon emissions. Half of companies were upgraded for doing nothing—the result of methodological changes.” Another study found that companies rated highly enough to be part of ESG portfolios had “worse records for compliance with labor and environmental laws relative to companies in non-ESG portfolios during the same period.”

And finally, the authors found that the relationship between financial performance and ESG ratings is “uncertain.”  Some studies showed that ESG scores “might be predictive of future risk,” while others found an inverse relationship between high sustainability ratings and performance, that is, “funds with low sustainability ratings perform[ed] better than those with high ratings.” Or they found  that companies with high ESG ratings “perform better during good economic times but worse during bad economic times,” or that ESG indexes performed better prelaunch and worse after launch, or that aggregating multiple ESG ratings might demonstrate some correlation with performance or that the “financial performance of ESG investing has on average been indistinguishable from conventional investing.”

In conclusion, the authors observe that, although the “purpose of ESG ratings is to provide information to market participants about the quality of a company’s ESG program and potential risks that might arise due to societal or environmental exposure, it’s not at all clear that these complex models either “predict investment risk or return” or “capture or predict improvements in stakeholder outcomes.” “What is the source of this failure?” the authors ask, “Is it due to methodological choices these firms make? Or is it due to the sheer challenge of measuring a concept as broad and all-encompassing as ‘ESG?’” Would “more expansive corporate disclosure improve the reliability of ESG ratings” or just add to the noise? “Is it possible for companies to effectively report on the vast number of potential stakeholder-related metrics that would be required (carbon emissions, pollution and waste, human capital management, supply chain practices, product use and safety, etc.)?” Similarly, should ESG ratings be subject to regulatory requirements similar to those applicable to major credit rating agencies?

