论文地址:https://arxiv.org/abs/2407.02158
项目地址尚未发布
效果预览:https://jingjingrenabc.github.io/ultrapixel/#paper-info
Ultra-high-resolution image generation poses great challenges, such as increased semantic planning complexity and detail synthesis difficulties, alongside substantial training resource demands. We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions (\textit{e.g.}, 1K to 6K) within a single model, while maintaining computational efficiency. UltraPixel leverages semantics-rich representations of lower-resolution images in the later denoising stage to guide the whole generation of highly detailed high-resolution images, significantly reducing complexity. Furthermore, we introduce implicit neural representations for continuous upsampling and scale-aware normalization layers adaptable to various resolutions. Notably, both low- and high-resolution processes are performed in the most compact space, sharing the majority of parameters with less than 3% additional parameters for high-resolution outputs, largely enhancing training and inference efficiency. Our model achieves fast training with reduced data requirements, producing photo-realistic high-resolution images and demonstrating state-of-the-art performance in extensive experiments.
超高分辨率图像生成带来了巨大的挑战,例如语义规划复杂性增加和细节合成困难,以及大量的训练资源需求。我们提出了UltraPixel,这是一种新颖的架构,利用级联扩散模型在单个模型中以多种分辨率(\textit{e.g.},1K至6K)生成高质量图像,同时保持计算效率。UltraPixel在去噪后期利用低分辨率图像的语义丰富的表示来指导整个高细节的高分辨率图像的生成,从而显著降低复杂性。此外,我们还引入了用于连续上采样的隐式神经表示和适应各种分辨率的尺度感知归一化层。值得注意的是,低分辨率和高分辨率过程都是在最紧凑的空间中执行的,共享大多数参数,高分辨率输出的 % 附加参数少于 3 个,这在很大程度上提高了训练和推理效率。我们的模型实现了快速训练,减少了数据需求,产生了逼真的高分辨率图像,并在广泛的实验中展示了最先进的性能。