Expert technical support for AI Development
Expert support on training distributed AI models
Expert support on completing large-scale distributed AI model training, including configuration tuning, scaling laws, code optimization, and debugging.
Service features
To train a very big AI model, you need to share the task between many GPU or HPC nodes for parallel computing. The more processors, the quicker you get results. A typical case would be the training of foundation models, which requires distributed runs across hundreds or thousands of GPUs. This is exactly the kind of job for a supercomputer.
Our AI specialists help you address the challenge of scaling your AI workload across many processors, to ensure that training jobs run efficiently and reliably. We help you in all aspects of distributed AI model training: configuring frameworks, optimising the usage of computing resources, deriving scaling laws, and debugging runtime issues.
To train AI models in this scale, you should apply for the Large scale access to LUMI to get the computing power you need. You do not need to be and HPC or AI expert, but to be granted such resources, you need to know your subject domain and your data well, and be prepared to spend considerable time, effort and resources of your own. Together we can map the computing and support needs that will make your project a great success.
Access/Output Requirements
- The results must be made public (open science), unless you pay for the computing resources.
Target groups
This service is for organisations that already have large resources – expertise, funding and data. Startups and SMEs can also benefit from this service in a co-operation project.
Large enterprises, public service organisations and research performing organisations can also apply for the “AI for Science and Collaboration” calls for computing resources of this magnitude.
Request process
Request this service via our contact form.
We will set up a meeting with you to estimate the required resources. To train AI models in this scale, you should also apply for the Large scale access to LUMI to get the computing power you need.
Pricing
This kind of a project requires both computing and human expert resources.
Large companies need to pay for this service. The price for the service is based on current LUMI pricing and will be determined together with our experts when the project is planned.
Large companies and public organisations can apply together for the computing resources for this kind of project via the “AI for Science and Collaboration” call.
If you are a startup or SME and think your project requires this service, contact us and let’s evaluate your case together for possible free of charge services.