Research highlights:

Certification of AI systems

The ability to assess the risks associated with AI systems will be central to the implementation of the EU’s AI Act. Understanding how to certify AI, and the mechanisms that can deliver trustworthy certification, is essential.

The EU’s AI Act proposes a suite of legislative interventions that aim to ensure citizens can have confidence in AI-enabled products and services, by minimising the risks associated with AI deployment. An important pillar of this approach is the development of certification mechanisms to demonstrate that an AI system functions as expected. Certifications provide guidelines for those developing AI to help ensure the resulting products are reliable; they provide information for consumers about product performance; and they can support regulatory assurance processes that guarantee only safe and effective products are brought to market. While AI development does have established best practices, to date it has lacked clear standards and guidelines to form the basis of certification mechanisms. Responding to this policy need, researchers at TÜV Austria Group and Johannes Kepler University Linz, supported by the ELISE network, have created a framework for a new AI certification mechanism.

Certification is a process through which an independent body signifies that a product or service has been tested against objective performance standards, and has been found to meet those standards. The development of AI certification mechanisms is challenging for a variety of reasons. Creating and adhering to certification processes requires:

a theoretical understanding of core technical concepts;
standardised quality assessment processes and ways of clarifying requirements around correct usage of AI technologies;
the ability to account for domain gaps between training and ‘real-world’ data;
the ability to accommodate rapid developments in the capabilities of AI technologies;
mechanisms to translate ethical considerations into technical review;
access to qualified personnel with the skills and knowledge to evaluate AI applications;
the ability to evaluate robustness in deployment, including to adversarial attacks.

Responding to these challenges, the certification framework proposed in Trusted Artificial Intelligence: Towards Certification of Machine Learning Applications sets out a process by which an independent third party could test the quality and safe use of AI applications.

In this certification process, the organisation being certified receives a requirements catalogue, which sets out what capabilities or guarantees an AI application should demonstrate. This requirements catalogue can be used by AI developers to analyse the performance of their system, identify gaps or deficiencies, and take action to address any safety or security concerns. It can be used to help define the scope of certification sought by developers and application owners. This documentation is submitted to an auditor, and used as a basis for interviews with the developer team and inspections of their work. The resulting audit report reviews the performance of the system, making use of an audit catalogue that specifies the different system characteristics that contribute to its safety and efficacy in deployment.

The audit catalogue proposed by the TÜV Austria Group and Johannes Kepler University Linz team is structured into three main themes:

security in software development, which considers the core software capabilities needed to ensure a system is safe and secure in deployment;
functional requirements, which reviews topics relating to model development, including data collection, model selection, and other methodological considerations; and
ethics and data protection, which considers issues associated with the use of personal data and wider societal interests such as fairness and privacy.

Underpinning these themes are detailed specifications relating to over 200 system requirements.

The first iteration of this framework, published in 2021, focuses on the certification of supervised machine learning methods in low-risk applications. Building on its success, the audit catalogue is being extended and refined, with the anticipation that this work will become an important contributor to European efforts to create a risk-proportionate regulatory system for AI applications.

This snapshot summarises the findings of work by Philip Matthias Winter, Sebastian Eder, Johannes Weissenböck, Christoph Schwald, Thomas Doms, Tom Vogt, Sepp Hochreiter,and Bernhard Nessler.