Run:AI Industrial IoT Case Study | Asia Growth Partners

Autonomous Vehicle Company Wayve Ends GPU Scheduling ‘Horror’

Run:AI

Wayve, a London-based company developing artificial intelligence software for self-driving cars, was facing a significant challenge with their GPU resources. Their Fleet Learning Loop, a continuous cycle of data collection, curation, training of models, re-simulation, and licensing models before deployment into the fleet, was consuming a large amount of GPU resources. However, despite nearly 100 percent of GPU resources being allocated to researchers, less than 45 percent of resources were utilized. This was due to the fact that GPUs were statically assigned to researchers, meaning when researchers were not using their assigned GPUs others could not access them. This created the illusion that GPUs for model training were at capacity even as many GPUs sat idle.

Download PDF

How one company went from 28% GPU utilization to 73% with Run:ai

Run:AI

The company, a world leader in facial recognition technologies, was facing several challenges with their GPU utilization. They were unable to successfully share resources across teams and projects due to static allocation of GPU resources, which led to bottlenecks and inaccessible infrastructure. The lack of visibility and management of available resources was slowing down their jobs. Despite the low utilization of existing hardware, visibility issues and bottlenecks made it seem like additional hardware was necessary, leading to increased costs. The company was considering an additional GPU investment with a planned hardware purchase cost of over $1 million dollars.

Download PDF

London Medical Imaging & AI Centre Speeds Up Research with Run:ai

Run:AI

The London Medical Imaging & AI Centre for Value Based Healthcare was facing several challenges with its AI hardware. The total GPU utilization was below 30%, with significant idle periods for some GPUs despite demand from researchers. The system was overloaded on multiple occasions where more GPUs were needed for running jobs than were available. Poor visibility and scheduling led to delays and waste, with bigger experiments requiring a large number of GPUs sometimes unable to begin because smaller jobs using only a few GPUs were blocking them out of their resource requirements.

Download PDF

Case Studies.

Contact us

No account yet?

Case Studies.

Contact us