LaRa

Efficient Large-Baseline Radiance Fields

ECCV 2024



University of Tübingen, Tübingen AI Center; ETH Zürich


TL;DR: We train a feed-forward 2DGS model in two days using 4 GPUs.



How LaRa works

Abstract: Radiance field methods have achieved photorealistic novel view synthesis and geometry reconstruction. But they are mostly applied in per-scene optimization or small-baseline settings. While several recent works investigate feed-forward reconstruction with large baselines by utilizing transformers, they all operate with a standard global attention mechanism and hence ignore the local nature of 3D reconstruction. We propose a method that unifies local and global reasoning in transformer layers, resulting in improved quality and faster convergence.

Method: LaRa represents scenes as Gaussian Volumes and combines this with an image encoder and Group Attention Layers for efficient feed-forward reconstruction:

  1. Scenes are represented as Gaussian Volume.
  2. An embedding volume models 3D prior, leaned from dataset.
  3. Extract per-view DINO features and lift them into 3D feature volumes.
  4. Volume attention between feature volume and embedding volume, ouputs Gaussian Volume.
  5. Transform Gaussian volume into coarse-to-fine 2D Gaussian primitives, using splatting for efficient rendering.

Results

4 input views, video rendering together with mesh extraction are done within 2s
Reconstruction results on Gobjaverse testing set
Reconstruction results on Google Scanned Object dataset
Reconstruction results on Instant3D scenes

Comparison

Inputs MVSNeRF LGM Ours GT

Ablations

Citation

If you use this work or find it helpful, please consider citing: (bibtex)

@inproceedings{LaRa,
         author = {Anpei Chen and Haofei Xu and Stefano Esposito and Siyu Tang and Andreas Geiger},
         title = {LaRa: Efficient Large-Baseline Radiance Fields},
         booktitle = {European Conference on Computer Vision (ECCV)},
         year = {2024},
        } 

Acknowledgement

We thank Bozidar Antic for pointing out a bug, which resulted in an improvment of about 1dB. Special thanks to BinBin Huang and Zehao Yu for their helpful discussion and suggestions. We would like to also thank Bi Sai, Jiahao Li, Zexiang Xu for providing us Instant3D testing examples, and Jiaxiang Tang for helping us to construct a comparison with LGM. The website template is partly borrowed from Mip-Splatting and Instruct-GS2GS.