Research Article |
Corresponding author: Menno Wiersma ( menno.wiersma@protiviti.nl ) Academic editor: Annemarie Oord
© 2022 Iuliana Sandu, Menno Wiersma, Daphne Manichand.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits to copy and distribute the article for non-commercial purposes, provided that the article is not altered or modified and the original author and source are credited.
Citation:
Sandu I, Wiersma M, Manichand D (2022) Time to audit your AI algorithms. Maandblad voor Accountancy en Bedrijfseconomie 96(7/8): 253-265. https://doi.org/10.5117/mab.96.90108
|
Undoubtedly, the use of algorithms, and Artificial Intelligence (AI) algorithms in particular, has numerous benefits. Fields such as finance, healthcare, automotive, education, and recruitment, to name a few, have demonstrated successful application of AI algorithms. Conversely, cases of bad algorithms abound and lead to lost revenue, discrimination, disinformation, or even bodily harm. Currently, we have surpassed the stage of just observing bad algorithms. New European regulations governing AI force organizations to manage the risks introduced by algorithms and convince the public about the proper functioning of algorithms. In this context, can algorithms be rigorously audited to build public trust and if yes, how? This article aims to answer these questions by building on an auditing framework for model risk management that controls for the novelty introduced by AI algorithms while connecting AI algorithm audit with internal audit terminology.
Artificial Intelligence, audit, algorithms, internal audit, model risk
The article aims to guide internal auditors in the task of auditing Artificial Intelligence algorithms.
The urgency to audit AI algorithms is intensified by the regulatory actions happening in the European Union. Upcoming new AI regulations such as the EU Artificial Intelligence (AI) Act (
Besides current regulatory pressures, the need to audit AI algorithms is fundamentally driven by the fact that individuals and organizations increasingly base their decisions on data and algorithms.
There are several dimensions that make AI algorithms impactful but also unpredictable and difficult to control. One particularly salient dimension is the powerful but opaque (or “black-box”) nature of some AI algorithms (Burrell, 2016), where it is possible to model complex relationships between data, but it is often difficult to understand fully why the algorithm produced a certain output. Another dimension is the data‑intensive nature of some AI algorithms, which is somewhat of a double‑edged sword: on one hand this allows for much more fine‑grained and precise modeling (
In light of the recent regulatory changes, the established societal impact of AI algorithms and the dimensions that make AI algorithms powerful but unpredictable, various attempts have been made to audit AI algorithms (
In this article, we perform a literature review aimed at gaining insights into the elements which are important for the audit of AI algorithms. We use a general audit process definition according to which “auditing is the accumulation and evaluation of evidence about an audit object to determine the degree of correspondence between the characteristic of the audit object and established criteria”. By using such a general definition of auditing, we can borrow from other frameworks of auditing who use audit objectives, audit criteria and evidence collected to assess whether the audit object characteristics comply with the audit criteria. More specifically, we build on the Model Risk Management approach (
The article follows with an overview of the European Commission’s Artificial Intelligence Act marking the importance of algorithm audit, discusses the challenges to audit that algorithms introduce, investigates different approaches to control the risks of algorithms and ends with a framework aimed at the audit of algorithms.
A landmark regulation aimed at governing AI algorithms is the Artificial Intelligence Act issued by the European Commission (the AI Act) (
The AI Act uses a broad definition of AI: “artificial intelligence system (AI system) means software that is developed with one or more of the techniques and approaches listed in Annex I and can, for a given set of human‑defined objectives, generate outputs such as content, predictions, recommendations, or decisions influencing the environments they interact with” (
The risk‑based approach proposed by the AI Act recognizes three classes of risk: unacceptable risk, high risk and low or minimal risk. AI applications listed as carrying unacceptable risk are prohibited. Examples of this risk type are practices using techniques beyond a person’s consciousness, social scoring techniques likely to cause physical or psychological harm, activities exploiting vulnerabilities of specific groups, or use of real‑time remote biometric systems in publicly accessible spaces. High risk AI essentially consists of two lists of industries and activities that to date are recognized as high risk. The first list describes AI systems used as products or safety components of products covered by the sectorial Union law (for example, machinery, personal protective equipment, radio equipment, medical devices, transportation). The second list consists of “other” AI applications with risks that have already materialized or are likely to materialize (for example, biometric identification of natural persons, supply of water, recruitment, access to public benefits and services, access to or assessment in educational and vocational training, creditworthiness, asylum and border control, administration of justice). Low or minimal risk applications are AI systems that are not prohibited or have a high risk.
Before placement on the market, low risk systems have the possibility, but not the obligation, to follow a code of conduct on a voluntary basis. High risk applications described in Annex III of the AI Act, should perform a conformity assessment, and show they have in place:
Overall, the current phrasing of the regulation, from the AI definition to the division of risk categories, is broad such that it accommodates the dynamic nature of AI techniques and risks. Consequently, many stakeholders commented in their position papers that the definition of high‑risk was unclear or needed improvement. In such a case, the European Central Bank (ECB) suggested excluding specific creditworthiness applications for natural persons from the high‑risk category (
It is relevant to mention that algorithms and AI are in the focus of US regulators as well. Since approximately 2013, US financial institutions were already regulated on AI under the Supervisory Guidance on Model Risk Management SR 11‑7 (
In the Cambridge Dictionary, an algorithm is defined as “a set of mathematical instructions or rules that, especially if given to a computer, will help to calculate an answer to a problem” (
Three main aspects make AI algorithms challenging to control: the input data used to train AI algorithms, the way the algorithm operates, and the autonomous learning performed by an algorithm. It is problematic if the data used to train the algorithm is unrepresentative for the group where the algorithm will be applied. If the algorithm is trained on predominantly European individuals but the intention is to use it on a diverse population, containing other countries, then the algorithm might not function properly as it was trained on the wrong data. Additionally, many AI algorithms are applied without a thorough understanding of how they work and if they can answer the problem for which they are used. It is inaccurate to use algorithms aimed at predicting quantitative variables, such as amount of sales, to predict qualitative variables, such as impairment or no impairment. Lastly, it is often not clear what the algorithm learns from the data, whether it learns to spot the “right” aspect of a problem. For example, a failure of AI was documented when, instead of learning to identify cancerous lesions, AI learned instead to identify images coming from a specific piece of equipment (
Another challenge specific to AI algorithms is that they often use a large volume of structured (for example, tabular data) and unstructured data (for example, video data) which makes it difficult to ensure data integrity and representativeness. Additionally, many AI algorithms are self‑learning, continuously improving the algorithm, or adapting to changing circumstances. This introduces difficulties for the validation of the algorithm which changes from a point‑in‑time validation to a frequent or continuous (automated) monitoring situation. It also requires storing all algorithm changes and historical data used to train the algorithm. AI algorithms are also sensitive to the selection of hyperparameters. Usually simple algorithms, with less or no hyperparameters, can lead to underfitting (missing the underlying patterns in the data) while complex algorithms, with multiple hyperparameters that need tuning, tend to overfit (give a lower training error than the actual test error). In both cases this leads to poor performance of the algorithm in out‑of‑sample testing or production (
Considering the challenges introduced by algorithms, and especially AI algorithms, the question arises how an audit on algorithms should be executed and who should audit algorithms. A domain with experience in providing auditing services is the internal audit domain. The main purpose of internal audit is to add value and improve an organization’s operations and it does so by performing independent assessments on the effectiveness of governance, risk management, and control processes in an organization (
If auditing algorithms is expected from internal audit, then the internal auditor should have sufficient skills and experience to perform such an audit. This again makes the role of the internal auditor with respect to algorithms unclear. Various stakeholders still consider that internal auditors should work within the apparent current scope of their main activities and focus on operational and financial risks, on the governance and process rather than the correctness of the algorithms themselves (
According to Carawan et al. (2018) internal auditors now have an important role in the Model Risk Management domain, as internal auditors can be tasked with assessing the effectiveness of the Model Risk Management Framework used by financial institutions, including the governance, policies, procedures, and activities conducted to address the risk of model error. Internal auditors are also responsible for understanding when and how the model is used and if it is in line with the model’s stated purpose. The currently undocumented practice into model risk management also reveals the fact that internal auditors manage to perform a rigorous audit of the model risks only if they have knowledge on the model, as well. One indication of this can be the trend of including professionals with model knowledge (e.g., statisticians) in audit teams. Similar to the Model Risk Management domain, where the prevalent use of models by financial institutions pushed the internal auditor into assessing the risks of models, the internal auditor working for organizations who use algorithms will have to audit AI algorithms as well. For this, the internal auditor needs a framework as a guide to auditing algorithms.
The second phase involves the inclusion of AI ethics principles in professional codes of ethics such that AI has ethics‑by‑design. For example, the Association for Computing Machinery (ACM) revised in 2018 its 1992 Code of Ethics to include ethics principles for software developers. Such codes of ethics should guide AI developers to the right choices when faced with ethical dilemmas (for example, how much and what user data to collect). But placing the responsibility of developing ethical AI on the sole shoulders of AI developers can turn out to be ineffective when codes of ethics do not actually influence the choices of AI developers (
The third and current phase of AI ethical approaches concerns the standardization and operationalization of AI ethics. In this stage of AI maturity, the focus is placed on risk‑assessments of AI systems. In a risk‑based approach to auditing AI, teams of people close to the problem, highly familiar to the environment where the AI system operates, would sit together with the auditors, and generate a risk registry with everything that can go wrong with the algorithm. But in the case of AI, where data combines with algorithms and user behavior in unknown ways, teams of AI developers and auditors might not be able to anticipate important risks. In this case, the diversity of the team, not only in terms of expertise but also in terms of other aspects such as race, gender, education, cultural background, might be key to implementing a good risk registry. For example, while a team of Dutch developers of a globally deployed image captioning AI system might not identify the poor performance of AI as a risk, when tasked to identify celebrities from Taiwan, an international AI review team might.
SMACTR (Scoping, Mapping, Artefact Collection, Testing and Reflection), the internal audit framework proposed by
Guided by commonly recognized ethical principles (
A more encompassing framework which tries to operationalize the audit of algorithms, is the framework published in 2021 by an extensive team of researchers gathered from different universities and institutes, including the University College London and the London Stock Exchange (
The DAMA framework presents an interesting view on how a low number of dimensions interacting can lead to numerous auditing activities and that these activities and the subsequent certification of the algorithmic system are impacted by the level of access to the algorithm. It is also valuable to see the focus of the Assessment dimension on four main verticals inspired by the most recent development in Fair AI: fairness, robustness, explainability and privacy. Still, the operationalization of the DAMA framework is not included in the paper and so the concrete steps to take or items to check in an audit are unclear.
On September 22, 2020, the Netherlands Court of Audit (NCA) organized a thinking session with more than 30 experts on 5 themes related to AI algorithm use: data driven working, data quality, AI and algorithms, AI at the government, and transparency. One important conclusion of the thinking session was that due to the way algorithms are developed (for example, using historical data which might be biased or inserting bias through development choices), we can never have certainty that algorithms do not discriminate. As such, necessary controls need to be placed on algorithms. In its 2021 publication ‘Aandacht voor Algoritmes’ (
In 2021 NOREA, the Dutch Association of chartered IT‑auditors, released the Guiding Principles for Trustworthy AI Investigations (
The DAMA, SMACTR, the NCA and NOREA conceptual frameworks highlighted in this section are aimed at the ethical risks of AI algorithms. In this article we express an opinion that there is a need for a more comprehensive framework, which goes beyond ethical risks to consider other business risks of AI algorithms and of algorithms, in general. We further attempt to step beyond purely conceptual frameworks to design a framework which can be more readily implemented by an auditor in an audit of an algorithm. As such, we develop normative statements (or positions) against which an algorithm can be checked. These normative statements, which describe how things should work and are audit objectives, are aimed to resonate with auditors who use audit objectives in other audit tasks. For example, in an environmental audit, an audit objective (or normative statement) is that “Environmental policies exist during the reporting period as described in the notes to the environmental data” (
All previous work on the audit of algorithms highlighted in the previous section focuses mainly on the ethical risks introduced by AI algorithms, for example, the risk of discriminating against protected groups of people. The upcoming AI Act also addresses ethical issues around AI. Although they are very important risks, it is salient to regard ethical risks in the bigger picture of the overall risk that the use of AI algorithms carries: the risk that the algorithm produces erroneous results or is used in the wrong way. One area which can serve as a starting point for getting a more comprehensive picture of risks of algorithms, beyond ethical risks, is the area of model risk management. Conceptually, AI algorithms and models are related in that AI “algorithms operate by learning models from existing data and generalizing them to unseen data” (
The risk management around algorithms can be approached similarly to the Model Risk Management approach, with a “three lines” system (
Inspired from the Model Risk Management best practice (
To offer more granularity to the Life cycle framework for AI algorithm audit, we present in Appendix
Returning to Table
To illustrate the application of the Life cycle audit framework, the hypothetical example of an algorithm supporting an online hotel rating system is used. Such a rating system can be attached to the website of an online travel company who lists hotels, and their ratings, based on the search terms of the customers (for example, location, period, amenities). The rating of hotels can, and does, play a role in the decision of customers to reserve a hotel room (
One risk of the rating algorithm is that its purpose is not clearly described and shared with the users (normative statement 2.1 is not met), leading to users being confused by the rating system and eventually not using the rating provided. As an example, users of the rating system might be under the impression that only user reviews matter for the rating, while the rating system has the overall purpose of “rating hotels based on user input and our own domain knowledge”. The introduction of domain knowledge in the algorithm can lead to a different rating (for example, a higher rating) than the one obtained from using only user input (for example, averaging user ratings). The auditor of such an algorithm can check the website of the online travel company to verify if the purpose of the rating algorithm is clearly described.
Another risk of the rating algorithm is that its performance is not checked per sub‑group (normative statement 4.6 is not met). If the algorithm has a lower performance for the low‑to‑medium quality hotels sub‑group, then users can lose trust in the rating and stop using it. In this situation, the algorithm developer can compare the rating of hotels which are rated on different rating platforms (benchmarking). If the results show a significant difference between the ratings for different hotel categories (for example, low‑to‑medium quality) on different platforms (for example, Booking versus Expedia), then this is an indication that the algorithm might not perform as expected per sub‑groups. The auditor might check if the developer has done the benchmarking and if the validator has challenged the benchmarking.
Rating algorithms which use external user input suffer from a risk of invalid data (normative statement 5.1 is not met). Fake user reviews used in the rating algorithm can materialize into incorrect ratings which lead to a public mistrust in the rating system. To get protection against fake user reviews, the algorithm developer can prompt users to provide a valid email address when making an account on the rating platform or can use other algorithms to check for the existence of fake user accounts. Here too, the auditor would verify the work done by the developer and whether the validation function sufficiently challenged the developer.
The wide use of algorithms and the breadth of their impact on society have created pressure for governmental institutions to set regulatory boundaries. The EU AI Act is one regulation spurred from this pressure. The EU AI Act prompts organizations to systematic review the algorithms they use and the ethical risks these algorithms pose to society. Such a systematic look can take the form of an algorithm audit. Moreover, an algorithm audit can play a role in managing the risks of algorithms even when ethical risks are not at the forefront. This is important as algorithms come with serious business risks ranging from providing unreliable information for internal decision‑making to fueling public loss of reputation.
The task of auditing algorithms is not an easy one. There is uncertainty as to who can audit algorithms and how a rigorous audit can be done. The internal audit function can play a role in providing insights into the risks that come with the use of algorithms even when ethical risks are not at a forefront. In this article we argue that internal auditors have in their arsenal processes and skills that can be adapted to the new challenge of auditing algorithms. We specifically pinpoint the Model Risk Management approach as something familiar to the internal audit profession and a good starting point for designing a response to the need for algorithm audit. In this way, by keeping connected to the established terminology of the auditing process (e.g., audit objectives, audit criteria) and knowledge (i.e., the Model Risk Management), we aim to show that internal auditors play an important role in assessing the governance and risk management around the use of algorithms, but also the algorithms themselves.
The article produces a framework aimed to serve as an initial guide for the audit of algorithms. The framework develops audit objectives based on the audit criteria revealed by the literature review but offers only an anecdotal example as to how the framework can be applied for an audit. Further investigation is required into the comprehensiveness of the framework, the potential challenges it might reveal when applied in the field (e.g., challenges in collecting evidence) and the recommended composition of the audit team charged with the audit of an algorithm (e.g., in terms of expertise or sociocultural background). An important extension of this article would aim at the development of practical guidance to be used by the auditor when it comes to auditing the algorithm itself (through the system) as there is currently a gap between what the role of the internal auditor should be (i.e., focused at looking both around and through the system) and their current role (i.e., of looking around the system).
Dr. Iuliana Sandu is Academic Director Trustworthy and Accountable AI expert practice at the Rotterdam School of Management and Erasmus Center for Data Analytics.
Drs. Menno Wiersma CFA, FRM, ERP is Senior Manager at Protiviti The Netherlands and responsible for Model Risk Management.
Ing. drs Daphne Manichand RA is Associate Director at Protiviti The Netherlands and responsible for Internal Audit.
Lifecycle | Risk | Normative statement (or position) |
---|---|---|
(statements in blue italic font are aimed specifically at AI algorithms) | ||
Initiation | An algorithm is developed lacking adequate governance, without having sufficient support or with too high risk, leading to unnecessarily using resources or creating unnecessary costs. | 1. Evidence of adequate governance |
1.1 Involved stakeholders (owner, validator, user) are clearly described. | ||
1.2 The roles of stakeholders are described and followed (for example, the owner approves the use of the algorithm, the validator fulfils the ‘second pair of eyes’ role for the developer function, the user provides feedback on the use of the algorithm). | ||
1.3 A clear segregation between roles who need to be segregated (for example, developer and validator) is in place. | ||
1.4 Diversity within AI development teams across race, gender, sexual orientation, age, economic conditions and more, dependent on potential bias | ||
1.5 An organizational culture of involving ‘experts’ for independent feedback on the algorithm in all its life cycle phases, rather than taking algorithm outcomes as truth. (for example, domain experts are involved in the Use phase of the algorithm). | ||
1.6 The algorithm is correctly listed in the inventory of algorithms. | ||
2. Preliminary purpose, regulatory environment and risk description | ||
2.1 The purpose of the algorithm is described and shared with the users. | ||
2.2 The regulatory risk category (high or low) is determined and specified. | ||
2.3 The regulatory requirements are listed (for example, the obligation to perform a conformity assessment as per the EU AI Act) and used in the development of the algorithm (for example, version control is used in the development of the algorithm). | ||
Development | Inadequate skills are used for the development and/or it is done without sufficiently understanding the context, causing the data or algorithm soundness to miss standards, which compromises the algorithm outcomes. | 3. Documentation requirements are followed: |
3.1 The internal guidelines (for example, codes of ethics used, corporate ESG values) are applied and documented. | ||
3.2 The input data and the output of the algorithm are described (for example, datasheets for data sets ( |
||
3.3 How the model works is adequately documented (for example, model cards ( |
||
3.4 A risk registry with all the potential harms that can be caused by the algorithms is in place. | ||
3.5 For all new algorithms, an impact assessment is performed including the documentation of all the possible risks, including ethical risks. | ||
4. The soundness of the algorithm is evidenced in the documentation: | ||
4.1 The choice of the algorithm and its settings (for example, hyperparameters) is sound and based on theoretical foundations (for example, benchmarked against previous algorithm uses), leading the algorithm to correctly identify relationships existing in the data as opposed to capturing noise in data. | ||
4.2 The choice of the algorithm is in line with the context where it is applied (for example, the chosen algorithm is in line with the business purpose; the algorithm development choices with respect to hyperparameters or other settings are in line with the data available (for example, more complex algorithms might require larger data samples)). | ||
4.3 Approaches to improve the algorithm (for example, regularization, activation functions, optimizers) are used soundly. | ||
4.4 Any external tools used (for example, a text parser which extracts features from text data) are understood. | ||
4.5 Mitigation measures are in place for risks and they are used conservatively (for example, even if there is only a potential risk to privacy, privacy constraints are placed on the algorithm at development). | ||
4.6 Tests (for example, performance accuracy per subgroups, sensitivity/scenario analysis, statistical fairness tests, overfitting detection) are executed to validate the performance of the algorithm. | ||
4.7 The results of the algorithm are benchmarked with subject matter expert opinions or other benchmarks (for example, results from other platforms or other algorithms). | ||
4.8 If necessary, an expert’s opinion on the algorithm or on the data is used. | ||
4.9 If applicable, expert opinions which are overridden are listed and justified. | ||
4.10 Assumptions and limitations of the algorithm are described (for example, the model card of the algorithm ( |
||
4.11 The outcomes of the algorithm are in line with corporate ESG values (for example, the results of an algorithm do not interfere with the value of ‘diversity’). | ||
4.12 For highrisks applications, decisions made by algorithms can be explained ( |
||
5. Data quality is satisfactorily described in documentation: | ||
5.1 Data is of quality: complete (for example, data is not biased such that it misrepresents protected groups), consistent, unique, timely, accurate, valid, complete ( |
||
5.2 Data transformations (for example, scaling, missing data imputation, feature engineering) are correct. | ||
Implementation | The implementation does not match with the developed algorithm, data feed quality is poor, or allows for the wrong use of the algorithm, which compromises the outcome during use. | 6. Implementation documentation is up to requirements and in line with development: |
6.1 The implementation process is documented (for example, the implementation might take place through a randomized controlled experiment ( |
||
6.2 The algorithm design is specified (for example, in a method card ( |
||
6.3 Changes to the algorithm or the data are described and documented | ||
6.4 There is Functional and User Acceptance Testing documentation, especially for external tooling. | ||
6.5 Technical roles and permissions are defined. | ||
7. Implementation results (algorithm and output) are in line with the design: | ||
7.1 The algorithm prototype (code, data, model, output) is in line with its implementation. | ||
7.2 Tests are performed to discover vulnerabilities (for example, fuzz testing). | ||
Use | The use of the algorithm is not in line with the design, or vice versa, causing the algorithm to give the wrong results which are misaligned with the purpose. | 8. The use of the algorithm is documented, in line with practice and in line with the purpose of the algorithm: |
8.1 There is documentation as to the use of the algorithm. | ||
8.2 The use of the algorithm is aligned with its purpose and with the documentation. | ||
9. Training of staff: | ||
9.1 The staff has knowledge about how to use the algorithm. | ||
10. Evidence of a formal possibility to give feedback and actual feedback from users: | ||
10.1 There is a user feedback loop implemented. | ||
Monitoring | The algorithm monitoring is not timely or does not track the correct indicators, whereby it cannot be tracked if the model continues to perform according to expectations. | 11. Monitoring documentation is in line with requirements and lists indicators with thresholds that signal model performance: |
11.1 Performance metrics (for example, performance accuracy) and acceptable thresholds are defined. | ||
11.2 The frequency of monitoring is adequate and followed up (for example, monitoring might be continuous for selflearning algorithms). | ||
11.3 The assumptions and limitations of the algorithm hold for the stated purpose and use of the algorithm. | ||
11.4 Conditional approvals (for example, the algorithm is accepted for immediate use with additional screening for bias) are monitored . | ||
Review | The algorithm review is not in line with its intended use or done timely, whereby it cannot be assured that the algorithm applied is still sufficiently sound and in line with the intended use. | 12. Documentation is in line with requirements and contains a stillfitforpurpose analysis, and a conclusion to apply ‘reparameterization’, ‘improve’ or ‘redevelop’ in line with requirements: |
12.1 There is a review frequency set that is followed up. | ||
12.2 The review contains a stillfitforpurpose analysis based on monitoring criteria (for example, the use of the algorithm is still in line with its purpose, there is still sufficient knowledge and understanding about the algorithm). | ||
12.3 The review provides a description and timing of improvement and planned changes in line with findings / weaknesses. | ||
12.4 If concluded by the review, reparameterization is performed (for example, dynamic calibration of algorithms where hyperparameters are automatically recalibrated might be done for selflearning algorithms) | ||
12.5 If concluded by the review, improvements are performed. | ||
12.6 If concluded by the review, redevelopment is performed. | ||
12.7 Previous issues are resolved, findings and recommendations have been implemented according to plan (for example, mitigate a risk by a specific date). | ||
Retirement | An algorithm and data which are no longer being used are not retired, clogging up the inventory or allowing wrong use or use without proper maintenance, or an algorithm still being used is retired, causing failure of procedures. | 13. Retirement procedures are followed in line with requirements: |
13.1 Dependencies of other algorithms are documented. | ||
13.2 There is no data redundancy. | ||
13.3 The algorithm is correctly reflected in the algorithm inventory. | ||
13.4 Algorithm versions and data are stored for audit purposes. | ||
Validation | Validation is not in line with the process, or without sufficient skills, causing insufficient challenge on the development, implementation and use, which compromises the algorithm quality. | 14.There are efficient controls in place to ensure proper model implementation: |
14.1 The internal guidelines followed are documented (for example, codes of ethics). | ||
14.2 There is an evaluation of the risk analysis (for example, including ESG risks) and classification (for example, high risk algorithms are correctly identified). | ||
14.3 The algorithm implementation can be replicated from the documentation. | ||
15. There is a fair challenge on algorithm soundness and data quality: | ||
15.1 There is an evaluation as to whether the development process is suitable to the underlying problem the algorithm is used for (for example, the development team is sufficiently diverse, there is conceptual soundness in the choice of the algorithms). | ||
15.2 There is an evaluation of the performance of the algorithm (for example, using statistical tests, kfold cross validation, under/overfitting analysis, sensitivity analysis, backtesting). | ||
15.3 Assumptions and limitations of the algorithm are challenged. | ||
15.4 There is an evaluation of the data quality. | ||
15.5 There is an evaluation of the ExtractTransformLoad process to identify potential problems from how the data is collected (for example, potential bias introduced in the data collection stage). | ||
16. Findings and recommendations are in line with the weaknesses found, and with requirements: | ||
16.1 The Validation provides findings and recommendations in a timely manner. | ||
16.2 The Validation provides a severity level for the risk (for example, untriaged, informational, low, medium, high, or critical). | ||
16.3 Developers and users are consulted with respect to the findings, recommendations and severity found. | ||
16.4 The conclusions from the Validation stage are followed up. |