Research Article |
|
Corresponding author: Rens van Haasteren ( renshaas@hotmail.com ) Academic editor: Annemarie Oord
© 2025 Vincent Damen, Menno Wiersma, Gokce Aydin, Rens van Haasteren.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits to copy and distribute the article for non-commercial purposes, provided that the article is not altered or modified and the original author and source are credited.
Citation:
Damen V, Wiersma M, Aydin G, van Haasteren R (2025) Explainable AI for EU AI Act compliance audits. Maandblad voor Accountancy en Bedrijfseconomie 99(4): 231-242. https://doi.org/10.5117/mab.99.150303
|
Internal auditors play a key role in ensuring artificial intelligence (AI) compliance with the EU AI Act. This article will examine how Explainable AI (XAI) can play a critical role in assessing AI systems for meeting the specific requirements of transparency, human oversight, and fairness. When effectively implemented, XAI enables traceability, accountability, intervention in AI decisions and can be used as a tool by internal auditors. Effective AI compliance auditing requires understanding of the methods for AI monitoring, associated documentation, and user feedback mechanisms to assess risks, regulatory requirements, and ethical standards.
Artificial intelligence, internal audit, EU AI Act, Explainable AI, transparency, human oversight, fairness
While the internal audit function in the oversight of AI systems is not mandatory under the EU AI Act, their contribution to ensuring compliance with it is increasingly recognized as essential. Internal auditors can assess whether XAI layers added to AI systems sufficiently address transparency, human oversight, and fairness requirements. XAI supports traceability and accountability, enabling effective risk evaluation.
High-risk applications of artificial intelligence (AI) underscore the critical need for reliable, accountable, and transparent AI systems. A clear example is found in credit risk assessment, where AI systems are used to determine whether individuals are eligible for loans. These decisions can significantly impact people’s lives, making it essential that such systems are explainable and subject to human oversight. Under the European Union’s Artificial Intelligence Act (EU AI Act), credit risk scoring is classified as high-risk and is therefore subject to strict transparency and accountability requirements. Actually, as this article will expand upon, all AI systems will need to (indirectly) adhere to some form of explainability requirements due to the EU AI Act.
The fundamental question of this article tries to resolve is:
“Can an explainability layer help AI deployers comply with the EU AI Act’s transparency and oversight requirements, and how can internal auditors use it for compliance verification?”
The answer: it depends. Explainable AI (XAI) can support compliance by making model decisions more transparent and understandable. However, its effectiveness varies, as some methods oversimplify complex models or provide inconsistent interpretations. To be useful for internal auditors, explanations must be clear, reliable, and actionable, ensuring internal auditors can effectively assess compliance.
In a previous MAB article, we provided guidance how internal auditors can build a framework to audit AI systems (
The European Regulation EU 2024/1689 came into effect on June 13, 2024 (
Core to the EU AI Act is a risk-based classification system for AI-systems that have specific compliance requirements attached to them. The risk-based classification system for AI technologies introduced in the Act, categorizes AI systems according to their potential impact on health, safety, fundamental rights and emphasizes transparency and human oversight (
Key operators in the AI value chain have been defined by the EU’s regulatory framework, and are important to understand, as they determine compliance requirements based on role (
The following three sections of this chapter focuses on transparency requirements, human oversight, and fairness. Transparency requirements within the EU AI Act necessitate that AI systems provide clear and understandable explanations for their decisions, thus calling for the use of an explainability layer to make AI systems’ decision-making processes more transparent. Human oversight ensures that AI systems are monitored based on model performance and explanations and can be corrected when necessary. Fairness principles seek to prevent biases in AI decision-making, and these biases can be identified and addressed using XAI techniques.
The EU AI act is linked to the European Union’s General Data Protection Regulation (GDPR), which addresses concerns about the opacity of decision-making processes by automated systems. The GDPR includes provisions for automated decision-making (ADM) based on personal data, establishing protections and safeguards for individuals when subjected to decisions based solely on automated processing. (
In the EU AI Act, AI systems classified as High-risk require providers to supply comprehensive technical documentation. This includes a general description, intended use, and technical details such as system interaction with other hardware or software, and data used for training, including type and relevance (
AI systems that interact with natural persons must disclose their AI nature unless it is obvious (
The EU AI Act strengthens transparency by introducing complaint mechanisms and a right to explanation, allowing individuals affected by AI-driven decisions to seek redress and clarification. This complements the obligation to inform users when they are interacting with an AI system. Under Article 85 (
The complaint mechanism and the right to explanation create a need for explainability in AI decision-making, even though many models are too complex to provide clear justifications. As a result, it can be argued that the EU AI Act implicitly instructs the use of Explainable AI (XAI) techniques to ensure that AI decisions can be understood and communicated.
The human oversight mechanism was first established under the GDPR, granting data subjects the right not to be subjected to solely automated decisions involving the processing of personal data that result in legal or similarly significant effects. In cases where such decisions are made, appropriate human supervision and intervention – often referred to as the ‘human-in-the-loop’ effect – are required to safeguard fundamental rights and freedoms (
Building on this foundation, the EU AI Act states that high-risk AI systems must be designed to enable effective human oversight. This includes mechanisms that allow humans to understand, monitor, and control the operations of AI systems. The objective is to ensure that AI systems are not autonomous but operate under human governance, enabling necessary interventions and informed decisions (
Human oversight also extends to ethical and legal considerations. AI systems must be developed and operated in ways that respect fundamental rights and comply with applicable laws (
During the design and development phases of AI systems, features facilitating human oversight must be incorporated into AI systems (
The Act further mandates that providers must implement measures for continuous monitoring of high-risk AI systems’ operation (
Moreover, the EU AI Act outlines low-risk scenarios, stating that an AI system that does not materially influence the outcome of decision-making should be understood as one that does not affect the substance or result of a decision, whether human or automated (
It is argued that the “human-in-the-loop” model under GDPR and “human-oversight-by-design” under the EU AI Act focuses on ensuring human involvement, but do not define how to ensure that oversight is effective or competent (Laux 2023). Moreover, the obligations placed on AI developers to ensure proper oversight remains underdefined, leaving room for significant interpretation.
AI systems are required to comply with the fundamental principles outlined in the EU AI Act. Among these principles, fairness is a cornerstone of trustworthy AI, ensuring that automated decision-making processes do not perpetuate bias or discrimination (
While the EU AI Act does not have a single “Fairness” article, fairness principles are embedded in multiple provisions, particularly those related to bias mitigation (
The primary legal mechanism ensuring fairness is the requirement that high-risk AI systems must not result in discriminatory, biased, or unfair outcomes. Article 10 of the EU AI Act ensures fairness by requiring high-quality, bias-free, and representative training data for high-risk AI systems. The next chapter, we will explore the implications of XAI techniques on ensuring fairness in AI systems.
AI is flexible in the way that it transforms raw information (input data) into the model’s prediction (output data) by finding the best statistical fit to ensure that the model captures the patterns in the data. Drawback is that it is not always directly clear what the exact relationship is, and why it is how it is. Such application is seen as ‘black box’ to the developer and the user, not knowing what happens inside. Users in this context are defined as IT users within the deployer of the AI system. High-risk AI systems must be sufficiently transparent to enable deployers to interpret their output and provide information that is relevant to explain or interpret their output, as suggested by the EU AI Act (
An explainability layer on top of an AI system, referred to as XAI, helps a user performing human oversight in the explanation and interpretation of the output. This is only interesting when the system uses black-box models. This can also be the case when using third-party tooling with proprietary models. Under the EU AI Act, it is not sufficient to solely explain the overall functioning of the AI system (global), the output for a specific input needs to be also explained (local), as the right to complain requires a local explanation. The XAI explainability layer and techniques are not only important for ensuring fairness, but might also be interesting for organizations seeking to comply with both the transparency and human oversight requirements of the EU AI Act.
In addition to a technique giving global or local explanations, and the type of input data that the XAI technique can work with, there are other aspects that characterize specific XAI techniques, which should be considered when designing the aforementioned explainability layer:
The explainability of an outcome cannot be looked at on a standalone basis. It needs to be assessed together with the performance of the model and stability of that performance. Low accuracy and/or stability need to be reflected in the explanation, to get a full understanding of the relationship and how strong it is.
In addition, the way of implementing explainability needs to be suitable to the level of expertise of the users performing oversight, and the users need to be sufficiently trained. In many cases, this may require additional representation tooling on top of the core XAI techniques. Based on a literature study,
To illustrate the benefits and limitations of XAI, a specific banking application is developed in this section to estimate if a specific loan is likely to default or not, based on a set of features. The section afterwards demonstrates several XAI techniques given this credit risk model. Credit risk models are high-risk models under the EU AI Act, and as such the risk classification must be explainable due to the transparency requirements of the EU AI Act.
| Age | Sex | Job | Housing | Saving account | Checking account | Credit amount | Duration | Purpose | Risk |
|---|---|---|---|---|---|---|---|---|---|
| 67 | Male | Skilled | Own | None | Little | 1169 | 6 | Radio/tv | Good |
| 22 | Female | Skilled | Own | Little | Moderate | 5951 | 48 | Radio/tv | Bad |
| … | … | … | … | … | … | … | … | … | … |
After encoding the ordinal (ordered variables: ‘Saving account’, ‘Checking account’) and nominal (unordered variables: ‘Sex’, ‘Housing’, ‘Purpose’) variables, the data is randomly split into 80% training and 20% test data. A Random Forest Classifier (introduced by
Two of the most widely used XAI techniques are LIME and SHAP. LIME provides a local explanation and SHAP can give a local and a global explanation.
LIME uses a simple model to approximate a complex model (
SHAP uses Shapley values that define the importance of an individual explanatory variable (feature), as the relative change in the output, with the specific feature included versus when it is excluded (
Table
| XAI technique | Objective | Type of input data | Global vs local | Assumes independence of features | Implementation difficulty | Clarity | Information density | Computational complexity | Remarks |
|---|---|---|---|---|---|---|---|---|---|
| LIME | Explaining individual predictions | Text Tabular Image | Local | No | Easy | Medium | High | Medium | Struggles with high-dimensional data |
| SHAP | Understanding global feature importance | Text Tabular Image | Both | No | Medium | Low | High | High | Mathematically grounded, based on cooperative game theory |
| Global surrogate model | Summarizing complex models | Tabular | Global | No | Medium | Medium | Medium | Medium | May oversimplify complex models |
| Anchors method | Explaining rule-based models | Tabular | Local | No | Hard | High | Medium | High | Struggles with high-dimensional data. |
| Counterfactual explanation | Identifying actionable changes to outcomes | Text Tabular Image | Local | No | Hard | Low | Medium | High | Difficult to find useful explanations for high-dimensional data |
| Permutation Feature Importance (PFI) | Assessing feature importance | Tabular Image | Global | No | Easy | High | Low | Low | Assumes independence between features |
| Partial Dependence Plot (PDP) | Visualizing relationships between features and predictions | Tabular | Global | Yes | Easy | High | Low | Low | Suitable for uncovering average effects |
| Individual Conditional Expectation (ICE) | Exploring feature impact on specific instances | Tabular | Both | Yes | Easy | Medium | Medium | Low | Lacks scalability |
| Accumulated Local Effects (ALE) plot | Improving global feature analysis | Tabular | Global | Yes | Easy | Medium | Medium | Medium | Suitable for identifying local effects in correlated datasets |
| Friedman’s H-statistic | Detect interactions between features | Tabular | Global | No | Easy | Easy | Medium | High | Has underlying theory from partial dependency decomposition |
| MDD-critic | Identify representative and not representative datapoints | Tabular | Global | No | Medium | High | Medium | High | Difficult to select proper number of prototypes and criticisms |
All these techniques are model agnostic, although there are variants that are model-specific. This means that XAI is quite flexible. Different techniques, or a combination of techniques may be used, dependent on the type of input data, sophistication of the XAI developer, sophistication of the user of XAI, and accuracy and consistency required.
The techniques are well available in standard or specific libraries of statistical programming languages such as Python and R, so that with limited effort an explainable layer can be added to an AI system. It may be beneficial in a validation where explainability was not embedded during development (e.g. for low risk systems), to implement XAI to get better understanding about the working of a model and where risks manifest (
| XAI technique | Python package | R package |
|---|---|---|
| LIME | Lime | Lime |
| SHAP | Shap | Shapr |
| Global Surrogate Model | Scikit tree | Iml |
| Anchors method | Alibi | Party |
| Counterfactual explanation | DiCE | Counterfactuals |
| Permutation Feature Importance (PFI) | Scikit inspection | Vip, iml, DALEX |
| Partial Dependence Plot (PDP) | Scikit inspection | Pdp, iml, DALEX |
| Individual Conditional Expectation (ICE) | Scikit inspection | Ice, iml, pdp |
| Accumulated Local Effects (ALE) plot | PyALE | ALEPlot |
| Friedman’s H-statistic | Artemis | Iml |
| MDD-critic | mmd-critic | eummd |
While XAI offers significant benefits when tailored to stakeholders’ needs, it also comes with notable limitations. Therefore, just like with AI itself, XAI cannot be implemented as a tool that will automatically resolve all transparency, human oversight, and fairness issues. Expert involvement is essential in choosing how to apply XAI, what method(s) to use, how to interpret results, how to communicate these, and to opine on the AI system in areas where XAI was not applied. Most important limitations are:
Internal auditors will play a critical role in evaluating AI systems in the context of meeting the transparency, human-oversight and fairness requirements of the EU AI Act. See also our previous article (
While the EU AI Act does not explicitly assign responsibilities to internal auditors, it imposes clear compliance and documentation obligations on deployers of high-risk AI systems. Deployers are required to conduct a fundamental rights impact assessment (
Considering internal auditors will be tasked to provide assurance that the outputs of AI systems can be understood and explained , not just for their functionality, but also to verify adherence to fundamental rights, safety, and ethical principles as mandated by the EU AI Act (
The EU AI Act also does not mandate a specific assurance reporting format for providing assurance on AI systems. In order to have a structured approach for AI assurance and reporting purposes, internal auditors could for example align with the International Standard on Assurance Engagements 3000 (ISAE3000) (
In the context of AI, internal auditors would apply assurance by evaluating whether the AI system complies with the EU AI Act’s requirements. Assessing the design and effectiveness of internal controls related to AI governance, transparency and oversight, and reviewing documentation such as risk assessments, logs, and human oversight protocols. If an AI system lacks these capabilities and has not been challenged by a second line, internal auditors may need to assess it by implementing an explainability layer themselves. This may be the case when second-line challenge is not available (especially in a non-financial institution environment). In such a case the internal auditor takes up a combined second- and third-line role. If assessing an already implemented and used AI system, it may not be sufficient for the internal auditor to notice a design deficiency in case an explainability layer is missing. If already implemented, the internal auditor may also want to test operational effectiveness by independently adding an explainability layer to the system. It is likely that external expertise would be required to support the internal audit function. Chapter 3 helps to understand the capabilities to look for when hiring external expertise. In case there is a second line function available, it is more likely that the internal auditor feeds back to the first and second line to resolve.
In case explainability is part of the system, internal auditors must evaluate whether this explainability is sufficient for the system’s intended use and matches users’ level of understanding. Moreover, internal auditors should evaluate whether the level of explainability is adequate for the system’s intended use and aligns with users’ ability to interpret it. As highlighted by the European Confederation of Institutes of Internal Auditing (ECIIA), it is considered good practice for organizations to establish policies, procedures, or guidelines that define explainable AI (XAI) requirements and their practical implementation to support compliance efforts (
As AI systems become more integral to organizational decision-making, Explainable AI (XAI) serves as a key mechanism for internal auditors to evaluate whether these systems comply with the EU AI Act. Specifically, XAI supports critical assessments of transparency, human oversight and fairness, which are central obligations under Articles 10, 13, and 14 of the EU AI Act (
XAI enhances transparency by making AI decision-making processes understandable to human stakeholders. For internal auditors, this involves ensuring that the decision logic behind AI outputs can be clearly articulated, verifying whether input features, processing steps, and model decisions are documented and interpretable, assessing whether users can understand how outcomes are generated.
XAI also plays a role in evaluating fairness, particularly in ensuring that AI systems do not produce discriminatory outcomes. Internal auditors should apply XAI to detect bias patterns in both training data and model logic.
By enabling transparent inspection of model behavior, XAI also supports compliance with Article 10 on data quality and governance and helps uphold broader EU human rights and non-discrimination principles. According to Article 14, AI systems must allow for meaningful human oversight, enabling human intervention where necessary. XAI contributes to this by providing actionable explanations that support human operators in overriding or correcting AI outputs.
Internal auditors can use these insights to evaluate whether oversight mechanisms are not just formally present but also functionally effective. It is thus considered good practice for an organization to implement policies, procedures, or guidelines outlining XAI requirements and their application to enable this.
From an EU AI Act compliance perspective, particularly concerning high-risk systems, as well as for risk management practices for systems with a different classification, outputs need to be understandable and explainable. This means that during the development phase, consideration should be given to either designing interpretability into the system or layering explainability on top of it. If explainability is absent, this indicates a design deficiency, potentially leading to biases and unwanted behavior. Depending on the risk and impact of the system, this may be a blocking issue, in which case there is no added value for the internal auditor for testing operational effectiveness. Alternatively, internal auditors may test operational effectiveness by adding an explainability layer during their review, e.g. also in case the system is already in use (see also Chapter 4). In either case, internal auditors must assess whether the appropriate XAI techniques are employed to satisfy explainability requirements. These requirements must be clearly defined, addressing characteristics such as transparency and user comprehension.
However, it is important to note that XAI does not directly equate to meeting the Transparency and Human Oversight requirements outlined in the EU AI Act. Transparency and Human Oversight entail broader considerations, such as ensuring meaningful human intervention and accountability at critical points in the AI lifecycle, as described in the EU AI Act. While XAI may enhance explainability, internal auditors should carefully evaluate whether the organization’s approach to XAI truly addresses the EU AI Act’s regulatory standards or if additional measures are needed to meet these obligations.
The potential non-compliance and liability risks associated with incorrect decisions made by AI systems, whether direct or indirect, underscores the need for substantiating why certain decisions were made by the system. This is where human oversight becomes essential. High-risk systems must incorporate human interaction within their processes. For systems classified differently, human oversight is a requirement when incorrect outcomes occur, and affected individuals require an explanation. In both scenarios, users need to understand the system’s outputs and have the ability to provide localized explanations. Audit testing should ensure organizations have processes in place to uphold fundamental rights under the EU AI Act. Any model requires explainability, as the ability for customers to file a complaint with a market surveillance authority if they suspect a violation applies to all AI systems covered by the regulation, not just high-risk systems. The EU AI Act also gives individuals the right to an explanation for decisions made by high-risk AI systems listed in Annex III, with some exceptions. Affected individuals must receive clear explanations about the AI system’s role in decision-making and the key factors influencing the outcome. To support this, organizations can use XAI to showcase transparency by providing insights into how AI systems operate. Organizations must also show that users, including those handling complaints, are properly trained to understand the system and its explainability features, ensuring compliance with the Regulation. For the internal auditor it is important to understand that different AI systems may deliver varying levels of accuracy or stability over time. To ensure a thorough understanding of the model, performance-related information should be a part of the explainability process. Well-designed XAI systems incorporate this into the explanations provided to users.
The internal auditor’s responsibilities extend beyond the development phase. Ongoing monitoring and review of the AI system, guided by established policies, must include an assessment of the XAI layer’s effectiveness. A user feedback loop should also be implemented, enabling users to consistently provide input on the system’s performance, particularly regarding any malfunctions or areas for improvement. This feedback is essential for future system improvements, ensuring both functionality and explainability remain robust over time. Below, we break down key areas in greater detail to help internal auditors deliver impactful results.
Internal auditors evaluating an organization’s ability to leverage XAI to meet the transparency, human oversight and fairness requirements outlined in the EU AI Act will require a structured approach to ensure compliance.
The first step would be to understand the organization’s AI governance framework (see also our previous article,
The internal auditor should also assess the technical capabilities of the AI system. This includes determining if the system provides understandable and accurate explanations for its decisions or outputs (the transparency). It needs to be assessed to what extent the explainability or interpretability of the model meets the standards required for transparency under the EU AI Act. Internal auditors should also assess the technical documentation provided with the system.
As we have seen, an important aspect of compliance is ensuring that human oversight mechanisms are in place. Internal auditors should assess whether human reviewers have the necessary tools, authority, and expertise to oversee AI decisions. This includes checking for procedures, workflows and tollgates that allow humans to intervene or override decisions made by the AI system in case of errors or ethical concerns.
In addition to transparency and human oversight, internal auditors should assess the fairness by reviewing records regarding the functioning of AI system, particularly those employing XAI. This includes logs of system decisions, interventions, and updates to the model or data. These records, in essence, provide evidence of compliant operations over a period of time, very useful when performing any sort of Test of Effectiveness (ToE) on the AI system.
Internal auditors should also compare the organization’s practices with the guidelines, standards, best practices provided by regulatory bodies on transparency and human oversight in relation to XAI. Industry standards from ISO/IEC, NIST and of course the IIA can serve as a valuable reference for compliance evaluation.
One of the key tools for internal auditors to leverage on, is the IIA updated AI Auditing framework (
Through the IIA AI Auditing Framework, internal auditors are encouraged to benchmark organizational AI practices against regulatory guidance (e.g., EU AI Act, U.S. Executive Orders) and industry standards such as ISO/IEC 22989 (AI Concepts and Terminology), ISO/IEC 23894 (AI Risk Management) and the NIST AI Risk Management Framework (
While the IIA AI auditing framework does not prescribe a single method for XAI, it acknowledges the growing importance of explainability in AI governance. It suggests that internal auditors should evaluate whether the AI system provides meaningful explanations to users and stakeholders. They should also assess whether an XAI layer is documented, used appropriately and confirm that human oversight mechanisms are in place and effective.
In conclusion, this article makes it clear that XAI can play a crucial role in enabling internal auditors to assess compliance with the transparency, human oversight and fairness requirements outlined in the EU AI Act. For any kind of application, and any level of risk, in the design there needs to be a mechanism in place, by which the outcome of individual cases can be explained. The internal auditor needs to test if the design of the model is effective from that perspective and compliant with the EU AI Act. In case it is designed effectively, but also when it is designed ineffectively, XAI can equip internal auditors to test operating effectiveness of the core AI system (see Chapter 4). These design and operating effectiveness tests are fundamental to assessing adherence to the regulatory requirements of the EU AI Act.
One of the primary ways XAI supports internal auditors, is through its ability to produce detailed, human-readable explanations of AI-driven decisions. This feature ensures that internal auditors can trace the logic behind specific outcomes, identify potential biases or errors, and verify whether decisions align with the organization’s ethical and operational objectives. Such transparency is critical for demonstrating compliance with the EU AI Act, which emphasizes accountability and the need for documented processes in the deployment of AI systems. Additionally, XAI’s capacity to generate detailed logs, track system updates, and explain decision pathways makes the traceability and auditability of AI systems possible. These capabilities allow internal auditors to maintain a record of system operations, making it easier to evaluate changes over time and ensure ongoing alignment with regulatory frameworks.
Importantly, this article also lists important limitations to the use of XAI. In addition to explainability, the integration of human oversight mechanisms, as outlined in the EU AI Act, ensures organizations remain accountable. The incorporation of these mechanisms into XAI-supported processes enables protocols for intervention in cases of anomalies, errors, or decisions with potentially adverse consequences. Internal auditors can use XAI to identify these issues proactively, ensuring timely corrective actions are taken.
From a practical perspective, aligning XAI practices with established industry standards and frameworks, such as those provided by the IIA, ISO/IEC and NIST, internal auditors can ensure their processes are structured and are consistently supporting compliance assessments. This alignment not only supports internal auditors in validating AI system operations, but also enhances the credibility of their findings as they are based on industry best practices.
V.A. Damen RE, CISA – Vincent, Associate Director Internal Audit & Financial Audit, Protiviti The Netherlands.
Drs. M.R. Wiersma CFA, FRM, ERP – Menno, Senior Manager Model Risk Management, Protiviti The Netherlands.
G. Aydin LLM, CIPM, CIPP/E, PRMIA – Gokce, Operational Risk Certified, Senior Consultant Risk & Compliance, Protiviti The Netherlands.
R. van Haasteren BSc – Rens, Artificial Intelligence Intern, Protiviti The Netherlands.