Weakly-Supervised Stitching Network for Real-World Panoramic Image Generation

Weakly-Supervised Stitching Network
for Real-World Panoramic Image Generation

ECCV 2022

Dae-Young Song¹, Geonsoo Lee¹, HeeKyung Lee², Gi-Mun Um², and Donghyeon Cho¹

¹Computer Vision and Image Processing (CVIP) Lab., Chungnam National University, Daejeon, South Korea
²Electronics and Telecommunication Research Institute (ETRI), Daejeon, South Korea

Paper

Supplementary

Code

Data

Checkpoints

Video

Abstract

Generate a 360° panarama without genuine ground-truth.

Recently, there has been growing attention on an end-to-end deep learning-based stitching model. However, the most challenging point in deep learning-based stitching is to obtain pairs of input images with a narrow field of view and ground truth images with a wide field of view captured from real-world scenes. To overcome this difficulty, we develop a weakly-supervised learning mechanism to train the stitching model without requiring genuine ground truth images. In addition, we propose a stitching model that takes multiple real-world fisheye images as inputs and creates a 360° output image in an equirectangular projection format. In particular, our model consists of color consistency corrections, warping, and blending, and is trained by perceptual and SSIM losses. The effectiveness of the proposed algorithm is verified on two real-world stitching datasets.

Image Stitching Network Architecture

Dataset Configurations

More Ablation Studies

Photoshop Result

Effect of Local Warping Layer

Additional Description for Perceptual Loss

Geometric distortions occur if L1 loss or low level feature map is used for training, because there are differences in centers between GT cameras. Therefore, we adopt high-level (3rd, 4th, and 5th maxpooling layer's output) feature maps to calculate perceptual loss. Since the VGG-16 is trained for classification, features such as edges or shapes are calculated with GT in the low-level feature map, whereas features in high-level feature map to classify objects are calculated. Specifically, the features represented in the figure below is shown for each level of each maxpooling layer of the VGG-16.

Download here [ file1 | file2 | file3 ] to observe at a larger resolution.

Contact

For more questions, please contact eadyoung@naver.com or eadgaudiyoung@gmail.com.

Citation


@InProceedings{Song2022Weakly,
  author={Song, Dae-Young and Lee, Geonsoo and Lee, HeeKyung and Um, Gi-Mun and Cho, Donghyeon},
  title={Weakly-Supervised Stitching Network for Real-World Panoramic Image Generation},
  journal={European Conference on Computer Vision (ECCV)},
  pages={54--71},
  year={2022},
  organization={Springer}
}

@article{song2021end,
  title={End-to-End Image Stitching Network via Multi-Homography Estimation},
  author={Song, Dae-Young and Um, Gi-Mun and Lee, Hee Kyung and Cho, Donghyeon},
  journal={IEEE Signal Processing Letters (SPL)},
  volume={28},
  pages={763--767},
  year={2021},
  publisher={IEEE}
}