Skip to content

Computing and data services

Distributed AI model training   

Expert support for completing large-scale distributed AI model training, including configuration tuning, scaling laws, code optimization, and debugging.

Service features

The Distributed AI model training assignment provides expert support for executing and optimising large-scale distributed training runs of AI models. Delivered by AI Factory specialists, this service addresses the challenges of scaling AI workloads across multiple GPUs or HPC nodes, ensuring that training jobs run efficiently, reliably, and at scale.

The service covers all aspects of distributed training, including configuration of frameworks, optimisation of resource usage, derivation of scaling laws, and debugging of runtime issues. A typical use case is the training of foundation models requiring distributed runs across hundreds or thousands of GPUs.

Request process

Request consultation via web form below.

Cost and billing

The LUMI AI Factory consultation is open and free of charge for startups and SMEs from the EU Member States and countries associated with the Horizon Europe programme.

Get started

Contact us and our expert team will get back to you with the next steps.