Abstract: Radiance field methods have achieved photorealistic novel view synthesis and geometry reconstruction. But they are mostly applied in per-scene optimization or small-baseline settings. While several recent works investigate feed-forward reconstruction with large baselines by utilizing transformers, they all operate with a standard global attention mechanism and hence ignore the local nature of 3D reconstruction. We propose a method that unifies local and global reasoning in transformer layers, resulting in improved quality and faster convergence.
Method: LaRa represents scenes as Gaussian Volumes and combines this with an image encoder and Group Attention Layers for efficient feed-forward reconstruction:
If you use this work or find it helpful, please consider citing: (bibtex)
@inproceedings{LaRa, author = {Anpei Chen and Haofei Xu and Stefano Esposito and Siyu Tang and Andreas Geiger}, title = {LaRa: Efficient Large-Baseline Radiance Fields}, booktitle = {European Conference on Computer Vision (ECCV)}, year = {2024}, }
We thank Bozidar Antic for pointing out a bug, which resulted in an improvment of about 1dB. Special thanks to BinBin Huang and Zehao Yu for their helpful discussion and suggestions. We would like to also thank Bi Sai, Jiahao Li, Zexiang Xu for providing us Instant3D testing examples, and Jiaxiang Tang for helping us to construct a comparison with LGM. The website template is partly borrowed from Mip-Splatting and Instruct-GS2GS.