Artificial intelligence (AI) image generators are becoming more powerful, and they usually rely on heavyweight large language models (LLMs) running in the cloud. But researchers say they’ve built a new system that can generate high-quality images using roughly 10 times fewer processing steps.
The result is AI that’s fast and efficient enough to run locally on phones and laptops, while being more secure and environmentally friendly than AI that runs on power-hungry data centers.
Article continues below
They outlined how the new model works in a study uploaded Sept. 25 2025, to the preprint arXiv database and announced March 4 in a statement that Lenovo has licensed the model for integration into its upcoming on-device AI platform. That means this system will soon appear in forthcoming smartphones, tablets and laptops.
The goal is simple but ambitious: to bring powerful generative AI out of remote data centers and onto the devices people actually use. This not only has implications for environmental impact and privacy, but could also make AI-based image generation faster than ever before.
Why most AI image generators are slow
Most modern text-to-image systems rely on a technique called diffusion. These AI models start with random noise – essentially a grid of pixels filled with random values – and gradually refine it into an image through a long sequence of steps.
Typically, that process takes 30 to 50 iterations to produce a finished image, with each step requiring significant computing power. That’s why many popular AI image generation tools run on large clusters of graphics processing units (GPUs) in remote servers via the cloud, rather than locally on a phone or laptop.
Achieving this level of efficiency is technically challenging, as it requires compressing a diffusion model to run in only a few steps while maintaining quality
Hmrishav Bandyopadhyay, doctoral researcher at the University of Surrey
That architecture works well for producing high-quality images, but it also creates practical limitations. The models are slower and energy-intensive, and they must send prompts or images to remote servers before waiting for a response.
In the new study, the scientists set out to tackle that bottleneck. SD3.5-Flash dramatically shortens the generation pipeline. Instead of dozens of iterations, the model can produce an image in just four processing steps, the scientists said.
This is achieved by compressing the diffusion process into a more efficient form while preserving image quality. In essence, the system learns how to “jump” through the fine-tuning process in larger leaps rather than inching forward step by step. According to the study, however, maintaining visual quality while reducing the number of steps is the core technical challenge.
“Our SD3.5-Flash model allows users to create images from text descriptions entirely on their device, with no data leaving their hardware,” said Hmrishav Bandyopadhyay, a doctoral researcher at the University of Surrey who developed the model during an internship at Stability AI, in the statement. “Achieving this level of efficiency is technically challenging, as it requires compressing a diffusion model to run in only a few steps while maintaining quality.”
Reducing the number of inference steps means the model requires far fewer computational resources, thus making it feasible to run on consumer-grade hardware.
Greater privacy, speed and AI sustainability
Running generative AI locally rather than in the cloud could have several advantages. The first is privacy: if an AI model runs entirely on a device, prompts and generated images don’t need to be sent to remote servers, which reduces the risk of data exposure, interception, or misuse.
The second is speed: With fewer processing steps and no network latency, image generation could become nearly instantaneous.
Finally, there’s an environmental angle. Large cloud AI models consume substantial energy and water through data center operations, but lightweight models running locally can dramatically reduce those demands.
Yi-Zhe Song, director of the SketchX Lab at the University of Surrey, said the broader aim is to make AI more accessible and practical: “SD3.5-Flash puts a powerful creative tool directly in users’ hands while keeping their data private and reducing the energy demands associated with cloud processing.”
In the study, the team tested SD3.5-Flash against traditional diffusion pipelines to measure whether the drastic reduction in processing steps affected the quality of the images. They evaluated the system using standard benchmarks for generative models, including image fidelity and the extent to which outputs match text prompts. These metrics are widely used in machine learning research to compare different image generation approaches.
Tests on standard image-generation benchmarks found the model could deliver results similar to traditional diffusion systems, despite cutting the number of processing steps from around 30–50 down to just four.
Most notably, the technology is already heading toward real products. Lenovo has licensed the model for integration into its upcoming Personal Ambient Intelligence platform, called Qira, which aims to bring AI capabilities directly to consumer devices.
That could enable features like AI image generation on laptops, tablets and smartphones without the need for an internet connection. In March, the company introduced its first set of Qira-compatible devices, including new concept devices, suggesting it won’t be much longer before we see this new AI system integrated into laptops, tablets and smartphones.
If successful, it would represent a broader shift in how generative AI is delivered. Instead of relying on centralized infrastructure, future AI tools may increasingly run locally on the edge — embedded directly into everyday devices. It’s something the researchers see as part of a larger push to make generative AI more efficient and practical.
Compressing large models without sacrificing quality remains an active area of research, but SD3.5-Flash suggests the gap between powerful AI systems and consumer hardware may be shrinking quickly. If companies like Lenovo follow through with device integrations, the next wave of AI creativity tools might not live in the cloud but in your pocket.


