MAB-scriptieprijs |
|
Corresponding author: Thomas Archer ( thomasarcher2000@gmail.com ) Academic editor: René Orij
© 2025 Thomas Archer.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits to copy and distribute the article for non-commercial purposes, provided that the article is not altered or modified and the original author and source are credited.
Citation:
Archer T (2025) Assessing the influence of green innovation on ESG ratings: A machine learning approach across developed and emerging economies. Maandblad voor Accountancy en Bedrijfseconomie 99(3): 145-154. https://doi.org/10.5117/mab.99.135692
|
This study examines the role of Green Innovation in predicting ESG ratings across developed and emerging economies. Among 292 firms, Green R&D Intensity is identified as a key predictor of ESG ratings. Results indicate that companies currently make minimal investments in Green Innovation, meaning modest increases in investments could enhance ESG ratings. Findings support Signaling Theory, suggesting Green Innovation can immediately boost ratings, though long-term impacts may require time to mature. The study also shows integrating Green Innovation into ML models reduces prediction error by 2% rising to 11.5% for firms without prior ESG ratings. Ultimately, the study’s implications underscore the importance of ESG factors for firms, investors, and policymakers, as higher ESG ratings are linked to increased firm value, improved performance, and economic growth.
ESG Ratings, Green Innovation, Green R&D Intensity, Machine Learning
For firms, focusing on Green R&D spending is crucial for enhancing ESG ratings, particularly for those without prior ratings, as it reduces capital costs and improves financial performance. For investors, incorporating Green R&D Intensity into investment models reduces prediction error, lowers investment risk, and supports better long-term returns. For policymakers, the findings encourage policies that promote Green R&D spending universally, enhancing ESG practices and contributing to economic growth.
In recent years, sustainable investment has grown rapidly, with ESG-focused assets under management now exceeding $17.5 trillion globally (
How does integrating Green Innovation into machine learning models enhance the prediction accuracy of ESG ratings, and how does the impact of Green Innovation vary between developed and emerging economies?
This study evaluates two proxies for Green Innovation: (1) Green R&D Intensity and (2) General R&D Intensity to determine which best explains variations in ESG ratings. The Random Forest model, frequently used in ESG research (Del Vitto et al. 2023;
The study’s contributions include (1) identifying Green R&D Intensity as a superior predictor for ESG ratings compared to General R&D; (2) demonstrating that machine learning models with Green Innovation inputs reduce prediction errors, particularly for firms without prior ESG ratings; and (3) showing that Green Innovation’s predictive power holds across economic contexts, supporting the broad applicability of these models. This study provides insights for firms to reduce capital costs by enhancing ESG ratings through Green Innovation (
ESG ratings evaluate firms’ sustainability across environmental, social, and governance pillars (
The ESG rating industry has seen significant growth, utilizing data from company disclosures, media, and regulatory filings (
| A study by Alves ( |
This study enhances the model by Chowdhury (
Green Innovation, or eco-innovation, refers to innovations in green products and processes that offer environmental benefits (
Green R&D influences all ESG components:
While models like Chowdhury (
H1: Green R&D better explains the variation in ESG ratings than General R&D.
Advancements in ESG rating prediction are driven by machine learning algorithms like Random Forest, XGBoost, and Neural Networks, which better handle complex, nonlinear data than traditional methods (
This study addresses these challenges by incorporating Green Innovation, a key indicator of sustainability commitment, into Chowdhury’s (
H2: Integrating Green Innovation into advanced machine learning models improves ESG rating prediction accuracy compared to traditional statistical models.
While most ESG studies focus on developed economies, recent research underscores the importance of understanding ESG in emerging markets (
Given these disparities, Green Innovation investments could represent a significant departure from average ESG practices in emerging economies, signaling a stronger commitment to sustainability and potentially having a more pronounced impact on ESG ratings. Building on the model of Chowdhury (
H3: The effect of Green Innovation on ESG ratings is stronger in emerging economies than in developed economies.
Note: In H2 and H3, “Green Innovation” will use the optimal proxy identified from H1.
ESG data: ESG (Environmental, Social, and Governance) ratings are a measure of a firm’s sustainability performance across these three key dimensions. ESG data from LSEG, covering over 90% of global market cap, evaluates 15,500+ companies based on 630+ metrics since 2002 (LSEG)]. Ratings are calculated from weighted Environmental (0.44), Social (0.31), and Governance (0.26) scores, using data from company reports and news sources. As
Firm-level indicators: fundamental financial data reflects long-term operational performance and ESG relevance (
Macroeconomic variables: Key macroeconomic indicators, based on Chowdhury (
Key transformations to ensure consistency across variables include calculating Green R&D Intensity and General R&D Intensity, lagging variables for 1–3 years, and normalizing data with log transformations (ESG, TotalAssets and GDP).
By standardizing the data and performing these calculations, the dataset is prepared for analysis. The initial dataset, sourced from LSEG and World Bank, comprised 267,778 observations. Observations were removed for countries not classified as developed or emerging by
| Description | Pre-Cleaning | Post-Cleaning |
|---|---|---|
| Time Frame | 2000–2023 | 2003–2022 |
| Number of Companies | 11,167 | 292 |
| Number of Countries | 46 | 26 |
| Total Observations | 267,778 | 1,597 |
| Observations from | ||
| Developed Economies | - | 1,388 |
| Observations from | ||
| Emerging Economies | - | 209 |
| Developed Economies1 | - | 15 Countries |
| Emerging Economies2 | - | 11 Countries |
Multicollinearity, which can inflate coefficient variances in regression, was evaluated using the Variance Inflation Factor (VIF). Most variables, especially those related to Green R&D Intensity, showed VIF values below 10, indicating minimal multicollinearity. However, lagged General R&D Intensity variables exceeded this threshold. Averaging the three lagged years (Lag1, Lag2, and Lag3) reduced the VIF to 6, supporting an averaged approach for these values to enhance model interpretability.
4.1.1. H1: Impact of green innovation on ESG ratings
The study tests the impact of Green R&D Intensity on ESG ratings, comparing it to General R&D Intensity. This is evaluated using OLS regression, with the base model containing only the Lagged ESG variable, as highlighted by Chowdhury (
ESGit = β0 + β1Green Innovationit + β2Lag1it + β3Lag2it + β4Lag3it β5Control Variablesit + εit
Note: Green Innovation combines Green and General R&D Intensity, analyzed separately to capture their individual impacts. Control variables refer to the firm-level and macroeconomic indicators outlined in subsection 3.1. The goal is to identify whether Green or General R&D Intensity more effectively explains variations in ESG ratings.
Hypothesis 2 examines whether adding Green Innovation improves ESG rating predictions within machine learning models. Random Forest (RF) is the primary model due to its resilience to overfitting and noise, as well as its effective performance in ESG predictions (Del Vitto et al. 2023;
For robustness, the study also includes Artificial Neural Networks (ANNs), known for handling complex, nonlinear data well (Del Vitto et al. 2023), and eXtreme Gradient Boosting (XGBoost), which iteratively refines predictions to enhance accuracy (
To prevent overfitting and ensure balanced evaluations, data splitting is applied. The RF model uses an 80-20 train-test split, consistent with methods by Chowdhury (
To assess the influence of economic context on Green Innovation’s effect on ESG ratings, an interaction term is included in an OLS regression. This allows analysis of Green R&D’s varying impact across developed and emerging economies.
ESGit = β0 + β1Green Innovationit + β2Lag1it + β3Lag2it + β4Lag3it + β5Econit + β6(Green Innovationit × Econit) + β7Control Variablesit + εit
Note: ‘Econ‘ is a dummy variable indicating economic context, where 1 represents developed economies and 0 represents emerging economies. The term β5Econit captures the baseline difference in ESG ratings across economic contexts, while the interaction term β6(Green Innovationit × Econit) tests whether the effect of Green Innovation on ESG ratings differs between developed and emerging economies. The model incorporates industry and year-fixed effects.
Note: In H2 and H3, “Green Innovation” will use the optimal proxy identified from H1.
Using OLS regression, this analysis assesses Green R&D Intensity and General R&D Intensity as predictors of ESG ratings, with Lagged ESG as the core predictor (following Chowdhury (
Regression Results for LogESG with Green R&D Intensity and General R&D Intensity.
| Variable | Base Model + Fixed Effects + Controls + R&D Intensity + Lag 1 + Lag 2 + Lag 3 Panel A: GREEN R&D Intensity | ||||||
| Panel A: GREEN R&D Intensity | |||||||
| Dependent Variable: Log ESG Ratings | |||||||
| R-squared (Adjusted) | 0.810 | 0.798 | 0.812 | 0.813 | 0.815 | 0.815 | 0.816 |
| LogLagged_ESG | 0.254*** | 0.253*** | 0.230*** | 0.229*** | 0.230*** | 0.230*** | 0.230*** |
| LogTotalAssets | - | - | 0.033*** | 0.033*** | 0.033*** | 0.033*** | 0.033*** |
| GDPG | - | - | 0.017*** | 0.017*** | 0.017*** | 0.017*** | 0.017*** |
| UNEM | - | - | 0.015*** | 0.015*** | 0.015*** | 0.015*** | 0.015*** |
| DER | - | - | -0.008** | -0.008** | -0.008** | -0.008** | -0.008** |
| EPS | - | - | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 |
| TIE | - | - | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| LogGDP | - | - | -0.010*** | -0.010*** | -0.010*** | -0.010*** | -0.010*** |
| Green R&D Intensity | - | - | - | 0.007** | 0.032*** | 0.032*** | 0.032*** |
| Lag1 Green R&D Intensity | - | - | - | - | -0.027*** | -0.027*** | -0.027*** |
| Lag2 Green R&D Intensity | - | - | - | - | - | -0.002 | -0.008 |
| Lag3 Green R&D Intensity | - | - | - | - | - | - | 0.009 |
| Panel B: GENERAL R&D Intensity | |||||||
| Dependent Variable: Log ESG Ratings | |||||||
| R-squared (Adjusted) | 0.810 | 0.798 | 0.812 | 0.812 | 0.812 | 0.812 | 0.812 |
| LogLagged_ESG | 0.254*** | 0.253*** | 0.230*** | 0.230*** | 0.230*** | 0.230*** | 0.230*** |
| LogTotalAssets | - | - | 0.033*** | 0.033*** | 0.033*** | 0.033*** | 0.033*** |
| GDPG | - | - | 0.017*** | 0.017*** | 0.017*** | 0.017*** | 0.017*** |
| UNEM | - | - | 0.015*** | 0.015*** | 0.015*** | 0.015*** | 0.015*** |
| DER | - | - | -0.008** | -0.008** | -0.008** | -0.008** | -0.008** |
| EPS | - | - | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 |
| TIE | - | - | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| LogGDP | - | - | -0.010*** | -0.010*** | -0.010*** | -0.010*** | -0.010*** |
| R&D Intensity | - | - | - | 0.003 | 0.003 | 0.003 | 0.003 |
| Lag1 R&D Intensity | - | - | - | - | -0.005 | -0.005 | - |
| Lag2 R&D Intensity | - | - | - | - | - | -0.000 | - |
| Lag3 R&D Intensity | - | - | - | - | - | - | - |
| Lag1_3 R&D Intensity | - | - | - | - | - | - | -0.060 |
Conclusion: the analysis finds that Green R&D Intensity significantly explains ESG ratings and does so more effectively than General R&D Intensity.
Green R&D Intensity significantly explains ESG ratings more effectively than General R&D Intensity. Specifically, a 1% increase in Green R&D Intensity correlates with a 0.032% increase in ESG ratings, persisting across some lagged values and suggesting immediate and long-term impacts. Lagged Green R&D (Lag1) initially reduces ESG ratings (-0.027%) but transitions to a positive overall effect as the investments mature. Conversely, General R&D Intensity shows no significant immediate or lagged effects on ESG ratings, underscoring that targeted Green R&D investments are more impactful for ESG outcomes.
Green R&D Intensity outperforms General R&D Intensity as a predictor of ESG ratings across sectors, particularly within Consumer Discretionary, Basic Materials, and Technology, where it significantly enhances model explanatory power. In contrast, Green R&D Intensity shows no significant effect in Utilities and Industrials, indicating sector-specific differences in R&D impacts on ESG performance.
General R&D Intensity consistently lacks significance across sectors, suggesting that targeted Green R&D efforts, focused on environmental improvements, align more closely with factors influencing ESG ratings. These robustness tests reinforce that Green R&D Intensity is a more effective predictor of ESG ratings than General R&D Intensity.
The OLS regression analysis (Table
Models with Green R&D Intensity demonstrate higher adjusted R-squared values, emphasizing its role in sustainable value creation and informing corporate strategy and policy.
Green R&D Intensity also shows a nuanced impact over time; initial investments may reduce ESG ratings due to upfront costs but have positive effects in subsequent periods, consistent with Song (
Additionally, a negative relationship between GDP and ESG ratings suggests that economic growth may conflict with sustainability, highlighting an area for further research.
In summary, Green R&D Intensity proves a stronger predictor of ESG ratings than General R&D, with even minimal investments yielding positive ESG impacts over time. Specifically, a 1% increase in Green R&D correlates with a 0.032% rise in ESG ratings, reinforcing the value of sustainability-focused innovation.
This section explores how adding Green R&D Intensity as a proxy for Green Innovation influences ESG rating predictions using machine learning models. Following H1 findings, Green R&D Intensity was selected as the primary innovation metric. The Random Forest (RF) model serves as the main approach, with Artificial Neural Networks (ANN) and eXtreme Gradient Boosting (XGBoost) as robustness checks.
Conclusion: incorporating Green Innovation enhances ESG prediction accuracy, particularly when no prior ESG data is available.
Performance metrics for the Random Forest (RF) model were evaluated with and without the inclusion of Green Innovation. Results indicate that adding Green Innovation improves prediction accuracy: MSE decreased from 0.013 to 0.012, RMSE from 0.113 to 0.111, and MAPE from 1.97% to 1.94%, reflecting more precise predictions.
The inclusion of Green Innovation further shows a reduction in prediction error, with RMSE decreasing by 1.77% and MAPE by 1.52%, demonstrating the model’s enhanced performance.
Analysis confirms Lagged ESG as the most critical feature, aligning with H1 results. Green R&D Intensity also has notable importance, underscoring its role in enhancing ESG prediction accuracy. Further robustness checks will also examine the impact of Lagged ESG to ensure reliability.
This section analyzes the impact of including the Green Innovation factor across different machine learning models, specifically Artificial Neural Networks (ANN) and eXtreme Gradient Boosting (XG-Boost). Subsequently, the impact of excluding the lagged ESG factor will be examined.
Artificial Neural Network and eXtreme gradient boosting
Three instances of the Artificial Neural Network (ANN) model were tested, showing that the inclusion of Green Innovation improved performance metrics, with reductions in MSE, RMSE, and MAPE. This improvement aligns with results from the Random Forest model, supporting the addition of Green Innovation in predictive models (Del Vitto et al. 2023).
For eXtreme Gradient Boosting (XGBoost), results across scenarios (excluding and including Green Innovation) indicate slight enhancements in MSE and RMSE with Green Innovation. While these differences are marginal, the MAPE value remained mostly unchanged, suggesting further investigation may be warranted (
Excluding lagged ESG
Given the high importance of the Lagged ESG variable found during analysis, examining model performance without this feature provides insights, especially for new companies or rapidly evolving industries where historical ESG data might be lacking or unrepresentative.
Results show that removing Lagged ESG while including Green Innovation reduces prediction error (RMSE) by 11.39%, underscoring Green Innovation’s value in improving model accuracy without prior ESG data. This highlights Green Innovation’s potential for accurate ESG predictions in emerging or transforming sectors.
The OLS regression results in Table
The inclusion of Green Innovation consistently improves the Random Forest model’s performance metrics (R2, MSE, RMSE, and MAPE), with prediction errors decreasing by 2% overall. Although ANN shows the highest relative improvement, XGBoost and Random Forest yield the best absolute error reductions. Notably, XGBoost exhibits a higher MAPE, supporting the argument of
Analysis confirmed that lagged ESG ratings are the most influential predictor, aligning with Chowdhury (
Figure
While Chowdhury (
In contrast to Del Vitto (2023), who leveraged the full ESG Asset4 dataset, discrepancies across regions (USA, Europe, China) suggest the need for further investigation into regional differences. This sets up Hypothesis 3, exploring Green Innovation’s differential impact across developed and emerging economies.
Conclusion: results show no significant difference in the impact of Green R&D Intensity (Green Innovation proxy) on ESG ratings between developed and emerging economies, thus not supporting Hypothesis 3.
To examine economic context differences, an OLS regression with interaction terms was conducted. The model included Green Innovation, lagged terms, and an interaction for Green Innovation with a developed economy indicator. Table
OLS Regression Results: Green Innovation with Interaction and Year Fixed Effects.
| Adjusted R-squared | 0.816 |
|---|---|
| Variable | Coefficient |
| LogLagged_ESG | 0.230*** |
| LogTotalAssets | 0.032*** |
| GDPG | 0.015** |
| UNEM | 0.012** |
| DER | -0.008** |
| EPS | 0.002 |
| TIE | 0.000 |
| LogGDP | -0.008** |
| Green Innovation | 0.021** |
| Lag1 Green Innovation | -0.026*** |
| Lag2 Green Innovation | -0.009 |
| Lag3 Green Innovation | 0.009** |
| Green Innovation × | |
| Developed Economy | 0.012 |
| Developed Economy | -0.014 |
The findings confirm Hypotheses 1 and 2, with the Lagged ESG coefficient (0.230) indicating that past ESG performance is a strong predictor of current ESG ratings. Green Innovation has a positive and significant effect on ESG ratings, consistent across both developed and emerging economies. Lag effects are also observed, with varying significance.
Including the interaction term shows that Green Innovation’s impact does not significantly differ between economy types. The coefficient for Developed Economy is negative but not significant, indicating no substantial difference in ESG ratings across economic classifications.
Significant economic variables, such as LogGDP, GDPG, and UNEM, indicate that economic conditions influence ESG ratings, though the developed/emerging classification adds no further explanatory power. Overall, the model explains 81.6% of ESG rating variance, as indicated by an adjusted R-squared of 0.816.
A subsample analysis of interaction terms between Green Innovation and economy type across industries confirms that Green Innovation’s impact on ESG ratings is not significantly different across develop and emerging economies. These robustness results support the main findings, leading to the rejection of Hypothesis 3. ESG ratings are not significantly different across developed and emerging economies. These robustness results support the main findings, leading to the rejection of Hypothesis 3.
Following Song’s (
Results indicate an insignificant interaction effect, suggesting that Green Innovation’s impact on ESG ratings does not vary significantly between economy types. The positive coefficient suggests a slight tendency for Green Innovation to benefit ESG ratings more in developed economies, though not significantly so.
Interestingly, emerging economies exhibit higher average ESG ratings over time, likely due to a selection bias that favors large-cap firms in these regions (
This study examines the impact of Green Innovation on ESG ratings, its role in enhancing machine learning predictive accuracy, and its differential effects across developed and emerging economies.
Analyzing data from 292 firms across 26 countries, Green R&D Intensity emerges as a more significant predictor of ESG ratings than General R&D Intensity, with consistent results across industries. The lagged effects of Green Innovation indicate a delayed but positive impact, emphasizing the need for a long-term perspective in sustainable investment.
Aligned with prior research (
Contrary to expectations, Green Innovation’s influence on ESG ratings does not differ significantly across economic contexts, with macroeconomic factors like LogGDP, GDPG, and UNEM playing a more substantial role than economic classification. This enhances the predictive power of machine learning models across various economic contexts.
The study contributes to ESG and Green Innovation literature in three ways: confirming Green R&D Intensity as a superior ESG predictor, demonstrating its value in reducing prediction errors in machine learning models, and revealing that broader economic classification has minimal impact on ESG predictions.
For firms, emphasizing Green R&D spending improves ESG ratings, aligning with studies like
This study has several limitations. First, the ESG data is solely from the LSEG database. Although
Future studies could address several areas to expand upon these findings. Firstly, using ESG data from multiple agencies could assess the robustness of results across different rating sources. Additionally, the significant negative impact of GDP on ESG ratings, a novel finding here, invites further exploration into how macroeconomic factors shape ESG practices. Selection bias, as discussed by Barkemeyer (
Potential research directions to build on these findings include analyzing ESG data from multiple rating agencies to validate these insights across datasets, investigating the GDP’s negative effect on ESG ratings to uncover broader macroeconomic influences, and exploring the specific investment stages and timeframes of Green Innovation. Such studies could enhance understanding of ESG ratings and support informed decision-making for investors and policymakers in sustainable investing.
T.J. Archer – Thomas
This article is based on Thomas’s master’s thesis, which was awarded the MAB Thesis Prize 2024. It is a condensed version, with key findings and conclusions preserved, while some details and data have been abbreviated to meet publication requirements.