Auditing AI#

In brief#

Auditing AI aims to identify and address possible risks and impacts while ensuring robust and trustworthy Accountability.

More in Detail#

One of the measures to ensure that AI is used responsibly is the initiation of auditing practices as they facilitate to verify if the system works as intended.

Audits can be conducted either in-house or by external parties. The former requires internal evaluation regarding whether systems are fit, the human elements involved with the system are appropriate and monitored, and the technical elements of the system are in perfect condition and function correctly [2, 3]. The latter involves both regulators and third-parties verifying compliance [4]. Interestingly, critical external audits encompasses “disparate methods of journalist, technicians, and social scientist who have examined the consequences of already-deployed algorithmic systems and who have no formal relationship which the institutions designing or integrating the audited systems” [5]. Well-known examples of these practices such as the Propublica’s examination of Northpoint recivism prediction API [6] or the Gender Shade Project [7], have played a crucial role pointing out harmful application of algorithm systems to draw the attention of the society and require companies an active role setting out governance and accountability mechanism.

As a result of these social demands, internal governance mechanisms [5] have been introduced from within the own companies that design and deployed the algorithmic systems. The goal is to propose technical and organisational procedures, among which are detailed frameworks for algorithm auditing [8], able to identify and address possible risk and impacts while ensuring robust and trustful accountability. In essence, precise and well-documented audits facilitate later scrutiny offering records on the reasons for the audit to be initiated, the procedures that were followed as well as the conclusions that were reached and, if carried out, the remedies or measures that were adopted.

To this regard, more and more voices consider audits as indispensable accountability mechanism to ensure the compliance of AI systems along their life-cycle with the different applicable legislation, concerning in particular privacy and data protection law [9]. Moreover, AI auditing can benefit from extensive literature in more mature disciplines, such as audit studies in social sciences [10] and empirical economics [1]. Audits facilitate private entities the provision of documentation when requested by public bodies, favouring a systematic governance [11] of AI systems through a general transparency and enforcement regime. This joint effort between public and private institutions would, in turn, result in collaborative governance scheme [11].

The upcoming EU Artificial Intelligence Act can be seen as a proposal to establish a Europe-wide ecosystem for conducting AI auditing [12] and in line with that idea more and more research is done on auditing procedures for algorithms (for reviews see [13, 14]). For example, [8] propose a framework for internal AI auditing which includes both ethical aspects (a social impact assessment and ethical risk analysis chart) and technical audits (such as adversarial testing and a Failure Modes and Effect Analysis). Such audits are often supported by technical documentation, such as the Datasheets for Datasets proposal [15] to maintain information on datasets used to train AI systems. Such documentation can both help to ensure that AI systems are deployed for tasks in line with the data they were trained on and help to spot ethical risks stemming from the data [16], such as biases.

Ethical risks can also be the sole focus of AI audits, as in ethics-based auditing (proposed for AI in [17]). While still in development, several options are emerging where: “functionality audits focus on the rationale behind decisions; code audits entail reviewing the source code of an algorithm; and impact audits investigate the types, severity, and prevalence of effects of an algorithm’s outputs.” [18] For these audits in particular determining what is measured can be a challenge, as it is difficult to define clear metrics on which ethical aspects of AI systems can be evaluated. Fairness metrics (cf. the entry on ) can certainly help here, but as discussed there is a difficulty in the selection of the right metric and even then there are limitations and trade-offs with other metrics. In addition, for the integration of AI ethics in ESG (Environmental, Social and Governance) reporting towards investors [19] such fairness metrics need not give sufficient insights into whether algorithms are used responsibly at an organisational level. Existing ESG criteria for organizational audits may help here, as well as work on KPIs for Responsible Research and Innovation [20]. Despite all this work on metrics, it is however still an open question to what extent ethics can be captured in numbers the way other aspects of audits are, with some arguing that it is impossible to develop benchmarks for how ethical an AI system is [21]. Instead, they argue, the focus should be on values and value trade-offs.

Z-Inspection, another auditing framework proposed based on the European High Level Expert Group’s Guidelines for Trustworthy AI [22], takes values as its starting point [23]. As can also be seen in a case study for the framework involving an algorithm that recognizes cardiac arrests in emergency calls [24] this framework proceeds from a wide identification of stakeholders and their values to the analysis of (socio-)technical scenario’s to reach an identification and (potentially) resolution of ethical, technical and legal issues of an AI system. Ultimately this still depends on the translation of values into metrics, and so the main challenge of developing such metrics stands regardless of one’s auditing approach.

Standards represent a natural framework for the proceduralization of audits. Certification by neutral third party states compliance to certain standards as the result of auditing. Several draft proposals are being prepared which include (at least implicitely) elements for conducting audits, such as the following:

However, there is not yet a formal professional standard to guide auditors of AI systems, yet some guidelines exist.



Andrea Romei and Salvatore Ruggieri. A multidisciplinary survey on discrimination analysis. Knowl. Eng. Rev., 29(5):582–638, 2014.


Ada Lovelace and UK DataKind. Examining the black box: tools for assessing algorithmic systems. Technical Report, Technical report, AdaLovelace Institute, 2020. URL:


Emre Kazim, Danielle Mendes Thame Denny, and Adriano Koshiyama. Ai auditing and impact assessment: according to the uk information commissioner’s office. AI and Ethics, 1(3):301–310, 2021.


Jennifer Cobbe, Michelle Seng Ah Lee, and Jatinder Singh. Reviewable automated decision-making: a framework for accountable algorithmic systems. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 598–609. 2021.


Jacob Metcalf, Emanuel Moss, Elizabeth Anne Watkins, Ranjit Singh, and Madeleine Clare Elish. Algorithmic impact assessments and accountability: the co-construction of impacts. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 735–746. 2021.


Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compas recidivism algorithm. ProPublica (5 2016), 9(1):3–3, 2016.


Joy Adowaa Buolamwini. Gender shades: intersectional phenotypic and demographic evaluation of face datasets and gender classifiers. PhD thesis, Massachusetts Institute of Technology, 2017.


Inioluwa Deborah Raji, Andrew Smart, Rebecca N. White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing. In FAT*, 33–44. ACM, 2020.


Bryan Casey, Ashkon Farhangi, and Roland Vogl. Rethinking explainable machines: the gdpr's' right to explanation'debate and the rise of algorithmic audits in enterprise. Berkeley Tech. LJ, 34:143, 2019.


Briana Vecchione, Karen Levy, and Solon Barocas. Algorithmic auditing and social justice: lessons from the history of audit studies. In EAAMO, 19:1–19:9. ACM, 2021.


Margot E Kaminski and Gianclaudio Malgieri. Algorithmic impact assessments under the GDPR: producing multi-layered explanations. International Data Privacy Law, pages 19–28, 2020.


Jakob Mökander, Maria Axente, Federico Casolari, and Luciano Floridi. Conformity assessments and post-market monitoring: a guide to the role of auditing in the proposed european ai regulation. Minds and Machines, pages 1–28, 2021.


Adriano Koshiyama, Emre Kazim, Philip Treleaven, Pete Rai, Lukasz Szpruch, Giles Pavey, Ghazi Ahamat, Franziska Leutner, Randy Goebel, Andrew Knight, and others. Towards algorithm auditing: a survey on managing legal, ethical and technological risks of AI, ML and associated algorithms. SSRN Electronic Journal, 2021.


Danaë Metaxa, Joon Sung Park, Ronald E Robertson, Karrie Karahalios, Christo Wilson, Jeff Hancock, Christian Sandvig, and others. Auditing algorithms: understanding algorithmic systems from the outside in. Foundations and Trends® in Human–Computer Interaction, 14(4):272–344, 2021.


Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. Datasheets for datasets. Communications of the ACM, 64(12):86–92, 2021.


Karen L Boyd. Datasheets for datasets help ml engineers notice and understand ethical issues in training data. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2):1–27, 2021.


Jakob Mökander and Luciano Floridi. Ethics-based auditing to develop trustworthy ai. Minds and Machines, 31(2):323–327, 2021.


Jakob Mökander, Jessica Morley, Mariarosaria Taddeo, and Luciano Floridi. Ethics-based auditing of automated decision-making systems: nature, scope, and limitations. Science and engineering ethics, 27(4):1–30, 2021.


Matti Minkkinen, Anniina Niukkanen, and Matti Mäntymäki. What about investors? ESG analyses as tools for ethics-based AI auditing. AI & SOCIETY, pages 1–15, 2022.


Zenlin Kwee, Emad Yaghmaei, and Steven Flipse. Responsible research and innovation in practice an exploratory assessment of key performance indicators (kpis) in a nanomedicine project. Journal of Responsible Technology, 5:100008, 2021.


Travis LaCroix and Alexandra Sasha Luccioni. Metaethical perspectives on'benchmarking'ai ethics. arXiv preprint arXiv:2204.05151, 2022. URL:


Nathalie Smuha. The EU approach to ethics guidelines for trustworthy Artificial Intelligence. In Computer Law Review International, volume 20, 97–106. 2019.


Roberto V Zicari, John Brodersen, James Brusseau, Boris Düdder, Timo Eichhorn, Todor Ivanov, Georgios Kararigas, Pedro Kringen, Melissa McCullough, Florian Möslein, and others. Z-inspection®: a process to assess trustworthy AI. IEEE Transactions on Technology and Society, 2(2):83–97, 2021.


Roberto V Zicari, James Brusseau, Stig Nikolaj Blomberg, Helle Collatz Christensen, Megan Coffee, Marianna B Ganapini, Sara Gerke, Thomas Krendl Gilbert, Eleanore Hickman, Elisabeth Hildt, and others. On assessing trustworthy ai in healthcare. machine learning as a supportive tool to recognize cardiac arrest in emergency calls. Frontiers in Human Dynamics, pages 30, 2021.

This entry was written by Alejandra Bringas Colmenarejo, Stefan Buijsman, and Salvatore Ruggieri.