Skip to content

LUMI AI Factory Data Lab

LUMI AI Factory Dataset-as-a-Service

Dataset-as-a-Service (DaaS) offers curated, high-quality datasets close to high-performance computing resources, enabling researchers, innovators, and industry users to focus on AI development rather than data acquisition and infrastructure management. For the data providers, Dataset-as-a-Service is a way to publish their high-value dataset for a wider research community. 

Service features

For data users, LUMI AI Factory DaaS offers fast, easy access to curated, ethically managed, high‑quality datasets located close to world‑class high-performance computing resources.

You save time by skipping data acquisition and cleaning, thanks to ready‑to‑use datasets designed to let you focus on AI development rather than data management.  With clear terms of use, a straightforward access process, and expert support, you can rapidly pilot ideas, experiment at scale, and accelerate your research or innovation projects.

As a data user, you will be able to:

  • search or browse the public data catalogue to find AI-ready, curated datasets
  • apply to access a dataset
  • use the dataset in you research
  • combine datasets with your own data, if needed, and delete the data you have uploaded for your own use.

For dataset providers, LUMI AI Factory DaaS offers a trusted, high-visibility platform to share high-value datasets to support Europe’s leading AI and research communities.

Publish your data to increase its impact, discover new use cases, meet compliance requirements, attract collaborators and customers, and participate in cutting-edge scientific and industrial innovation. You remain the data owner while your dataset is safely delivered on top of a world-class high-performance computing (HPC) ecosystem. 

As a data provider, you will be able to:

  • publish use copies of data – you remain the owner and control the master data, but benefit from added innovation by others using your dataset
  • limit the use of data, if needed
  • provide sufficient metadata and get support for generating it
  • get expert support in understanding your datasets, their quality and suitability for sharing
  • have access management for the use copy of your data
  • have data lifecycle management for the use copy

Access/Output Requirements

LUMI AIF DaaS datasets can be accessed and utilised free of charge, but to access the LUMI supercomputer and to run jobs with LUMI using the LUMI AI Factory DaaS datasets, you need to be a member of a project that has been granted resources on LUMI.

Many datasets are openly available, but some datasets can be restricted and thus there is a process for access management. The requirements to access a dataset can vary from dataset to dataset. 

Providing a dataset to LUMI AI Factory DaaS does not accumulate costs to the dataset provider. To provide a dataset you will need to agree to the Terms of Use of the service, assign a contact person and provide necessary metadata. LUMI AI Factory DaaS catalog is a curated collection and we will assess the suitability of the offered dataset before agreeing to publish it.

Target groups

  • Startups and SMEs
  • Large industry
  • Public organisations
  • Academia
  • Research performing organisations

Pricing

The datasets can be accessed free of charge. To access the LUMI supercomputer, the user has to be part of a research project that has been granted computing resources on LUMI. 

Read more information here: