LUMI AI Factory data streaming pilot paves the way for autonomous vehicles
The LUMI AI Factory is advancing rapidly in its first round of data streaming pilots, designed to demonstrate how enormous, fast-moving data flows can be transferred and processed on the LUMI supercomputer, the computing backbone of the LUMI AI Factory. One of the most ambitious pilots focuses on a smart car integration project carried out with the University of Oulu’s Faculty of Information Technology and Electrical Engineering in Finland.
Streaming data from a research vehicle to a supercomputer
At the center of the pilot is a Toyota RAV4 research vehicle equipped with an extensive array of sensors: lidar (Light Detection and Ranging) sensors measuring distances by rapid pulses of laser, stereocameras, thermal cameras, and additional onboard data sources. Together they produce more than 10 gigabytes of data per minute, resulting in hundreds of gigabytes to multiple terabytes from a single test drive.
– We are developing components for autonomous vehicles, such as machine vision algorithms that can recognise Arctic animals, potentially supporting Advanced Driver Assistance Systems (ADAS), explains Benjamin Kämä, Doctoral Researcher at University of Oulu.

High data rates require LUMI’s computing power
– LUMI’s computing power is essential for us. The sensor and vehicle data rates are so high that any real-time or near real-time processing requires massive computing capacity. Even offline processing – for machine vision applications or the development of digital twins – demands significant performance. Lidar data especially is extremely heavy and vital for both machine vision and digital twins, Kämä explains.
The long-term vision is a closed processing loop: sensor data flowing from the car to LUMI, processed by machine learning models running on LUMI’s HPC environment, and sending driving-related outputs back to the vehicle. Such a pipeline could eventually support advanced automated driving functions and enable a real-time digital twin of the car.
Toward real-time digital twins – and safe automated driving
Handling this scale of data in real-world driving conditions is far from trivial. The research group is well underway on the sensor and vehicle data collection, and middleware setup, but the cloud connection and integrations to use LUMI are still under development. Work with LUMI AI Factory experts continues to solve these challenges.
– Processing power in the vehicle, network connectivity, cloud capacity, and storage can all become limiting factors. Our goal is to iteratively address each challenge and push towards real-time data use, Kämä notes.

The vision of a continuously connected smart vehicle brings not only technical challenges but also crucial safety considerations.
– Safety is always number one in vehicles. Most systems are safety-critical, which means they require extensive testing and must also handle cases where the network connection fails. That’s why we also focus on local machine vision and machine learning models that can operate independently inside the vehicle if the cloud connection drops, Kämä emphasises.
Even with local fail-safes, cloud and compute resources such as LUMI remain essential for improving model quality, developing advanced ADAS functions, and generating large digital twin simulations of the vehicle.
The research team is also interested in publishing parts of their dataset as open-source material via the LUMI AI Factory’s Dataset-as-a-Service data catalogue.
An essential pilot for the LUMI AI Factory
– Being able to stream data from a moving vehicle to LUMI, process it on the fly, and even send results back would be a major demonstration of what our data streaming infrastructure can do. Autonomous vehicle research is an excellent stress test for our tools because the data is extremely heavy, the timing is tight, and reliability matters, notes Heidi Laine, WP leader for Data access and integration at the LUMI AI Factory.
This pilot is essential for showcasing how the LUMI AI Factory can support next-generation industrial workloads involving massive data streams, AI, and safety critical feedback loops.
Author: Anni Jakobsson, CSC
Image on top: research group