1Electronics and Telecommunication Research Institute (ETRI), South Korea
2Computer Vision Lab., Hanyang University, South Korea
The source code is provided only for data augmentation! For more details, please refer to the github repository.
Latent diffusion models have demonstrated superior performance over traditional methods in generating highly detailed and aesthetically pleasing images, which makes them widely used for various image generation and editing tasks, including outpainting. However, most LDM-based outpainting methods impose constraints on resolution and aspect ratio, often leading to the loss of local details and blurring. One way to address these issues is progressive outpainting, where the image is extended outward incrementally. However, naive progressive outpainting suffers from two key challenges: (1) difficulty in effectively capturing global context, making it hard to maintain the original context, and (2) a tendency to generate unnatural patterns. These challenges are particularly pronounced in art, where artists pre-design the composition before painting. As a result, existing methods often introduce visual inconsistencies that distract the viewer and diminish the intended artistic emphasis. To address these limitations, we propose two types of composition planning modules that enhance progressive outpainting by leveraging global structural guidance. These modules guide a pre-trained stable diffusion model to consider the overall composition, enabling realistic and contextually appropriate artwork completion without labor-intensive user prompts. Through experiments on diverse artwork images, we show the effectiveness of our proposed method both quantitatively and qualitatively.
This project was inspired by stablediffusion-infinity. When performing outpainting based on the progressive approach with Latent Diffusion Models, it is difficult to preserve the overall context and style of the original input as the number of generations increases. To avoid this, if outpainting is performed within a single window, it becomes challenging to achieve high-resolution outpainting and blurring may occur. To overcome these limitations, we propose a novel progressive outpainting method that also considers a global window. The module that considers the global window is connected to the frozen pre-trained denoiser and trained, which reduces training time while preserving the performance of the pre-trained denoiser. Our perspective is fundamentally different from conventional reference-based painting approaches that explicitly inject objects or styles from reference images. Even if there is no corresponding object to be generated, our method infers plausible composition automatically without explicit information. For more details, please refer to the paper and supplementary materials.
Gallery
These results are not cherry-picked. Failed cases are also included. Additional results are available in the main paper and the supplementary materials.
🖼️ Photorealistic outpainting at 2875×512 resolution
🎨 Artwork outpainting at approximately 3K × 1K resolution
For any questions, discussions, or commercial use inquiries, please contact the email below.
eadyoung@etri.re.kr
@inproceedings{song2025proout,
title={Progressive Artwork Outpainting via Latent Diffusion Models},
author={Song, Dae-Young and Yu, Jung-Jae and Cho, Donghyeon},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2025}
}