Research highlights:

Building a compute infrastructure for European AI

Access to compute is vital to continuing progress in AI. Europe already has the building blocks for a compute infrastructure that can support the development of sophisticated AI systems. In the long-term, next-generation compute methods can deliver a step-change in the performance of this infrastructure.

Increasing compute power has been an important enabler of recent advances in AI technologies. Following a phenomenon known as Moore’s Law, over the last six decades the number of transistors that can be built into a computer chip has roughly doubled every two years. Engineering innovations have also delivered chips specialised for use in developing AI systems. The resulting hardware and software applications support rapid information processing, allowing researchers to train sophisticated AI systems, including large foundation models – for example, Large Language Models such as ChatGPT – that enable AI inference by using standard language (“prompting”) and that have generated renewed excitement about the potential of AI.

This current rate of hardware development is not sustainable. Engineering design is approaching the limits of circuit miniaturisation on traditional silicon chips. Today’s chips have become so small that dissipating heat-cased errors significantly corrupt signals during operation. Quantum computing is currently facing similar challenges. Increasing energy demands arising from the use of large-scale compute have also highlighted the environmental impact of AI development, demonstrating the need for both energy-efficient AI development methods and sustainable energy sources for compute facilities. In response, a new approach to building compute infrastructure is needed. To chart a path towards this new infrastructure, ELISE researchers Petr Taborsky and Lars Kai Hansen at Technical University of Denmark (DTU) have created a strategic roadmap for compute in Europe.

Demonstrating how the EU can transform the compute landscape in the long-term while widening access to high-performance computing today, this roadmap highlights three important technical directions.

HPC virtualization
In the near future, a new generation of middleware, based on improved interconnections between CPUs, GPUs, and nodes, is set to emerge. This, together with task-specific accelerators, such as Transformer engines,  will enable HPC virtualization, offering user- and task-specific HPC environments, which are more powerful and more energy efficient than today’s systems. Virtualization, instead of relying on a single computer to perform a computation task, relies on distributed computing and effectively leverages multiple software and hardware – for example, memory – components across multiple systems to complete a shared task. This federated architecture may also be required from a privacy perspective, if the datasets being processed are multiple and localised, and cannot be copied or moved to a central location.

Quantum computing
Quantum computing develops computing methods based on quantum principles. The resulting systems should be able to solve problems that no classical computer could solve in a feasible amount of time. While there has been exciting recent progress in this field, whether or not so-called quantum supremacy – the creation of a quantum device that can perform calculations that could not be performed by a classical computer – has been achieved is the subject of active debate. Despite a few successful real-world quantum applications,  further work is needed to take these prototype systems to commercial scalability. For example the quantum encryption link has been developed under the FIRE- Q project supported by Innovation Fund Denmark and comprises academic and industrial partners that are now ready to commercialize the technologies. The technology is the result of 20 years of basic research supported by the Danish National Research Foundation through the research centres Silicon Photonics for Optical Communication (SPOC) and Hybrid Quantum Networks (Hy-Q).

Neuromorphic computing
Taking inspiration from the highly energy-efficient information processing of the human brain, neuromorphic computing seeks to develop orders of magnitude more energy-efficient computing systems. Neuromorphic computing methods and processors are in development; while at present there are few commercial offerings, there are a variety of applications where neuromorphic computing could play a role.

While research in these areas continues and moves towards commercialisation, the ELISE compute roadmap identifies steps that can be taken today to maximise the effectiveness of Europe’s existing High-Performance Computing infrastructure. Created in 2018, the European High Performance Computing Joint Undertaking (EuroHPC JU) provides a framework for EU countries to coordinate and pool their compute resources, creating a federated infrastructure that can support EU leadership in high-performance computing.  This network already provides access to eight supercomputers located across Europe, alongside a further seven systems managed by the PRACE network.  As a result, researchers can access compute resources more powerful than those typically available at universities, enabling delivery of robust experimental results in an environment that adheres to regulatory requirements regarding data governance and security.

By reducing the procedural complexity of accessing this network, facilitating the discussion between AI community and EuroHPC (workshops) in support of virtualizing the HPC user environments, and explaining how to use the systems to wide AI user base, including the governance requirements, the roadmap sets out a route to expanded, powerful, while energy conscious, access to this compute resource across Europe.

This snapshot summarises the findings of ELISE deliverable D-4.6 Report on recommendations for infrastructure roadmap, by Petr Taborsky and Lars Kai Hansen.