Bootstrap

[读论文][2s生成] SnapFusion: Text-to-Image Diffusion Model on MobileDevices within Two Seconds

摘要

Text-to-image diffusion models can create stunning images from natural language
descriptions that rival the work of professional artists and photographers.
However, these models are large, with complex network architectures and tens of denoising
iterations, making them computationally expensive and slow to run.
As a result, high-end GPUs and cloud-based inference are required to run diffusion models at
scale.
This is costly and has privacy implications, especially when user data is sent
;