Match Stereo Videos via Bidirectional Alignment

Imperial College London

Abstract

Video stereo matching is the task of estimating consistent disparities from stereo videos. There is considerable scope for improvement in both datasets and methods within this field. Recent learning-based methods often focus on optimizing performance for single stereo pairs, leading to temporal inconsistencies. Existing video methods typically employ sliding window operation over time, which can result in low-frequency oscillations corresponding to the window size. To address these challenges, we propose a bidirectional alignment mechanism for adjacent frames as a fundamental operation. Building on this, we introduce a novel video processing framework, BiDAStereo, and a stabilizer network plugin, BiDAStabilizer, which is compatible with general per-frame methods. Regarding datasets, current synthetic object-based and indoor datasets are commonly used for training and benchmarking, with a lack of outdoor datasets. To bridge this gap, we present a realistic synthetic dataset and benchmark focused on natural scenes, along with a real-world dataset captured by a stereo camera in diverse scenarios for qualitative evaluation. Extensive experiments on in-domain, out-of-domain, and robustness evaluation demonstrate the contribution of our methods and datasets, showcasing improvements in prediction quality and achieving state-of-the-art results on various commonly used benchmarks.

Comparsion with existing methods

Description of the image

Proposed Datasets Overall Description

Current stereo depth datasets are limited in both diversity and realism. There is no diversified real-world stereo video dataset for visualization and qualitative comparison. To compensate for the data shortage and boost the performance of learning-based video stereo models, we release a real-world dataset SouthKensington Stereo Video Dataset (Southken SV), and a synthetic dataset, Infinigen Stereo Video Dataset (Infinigen SV).

Download and Usage

This Link specify your download requirements. The releasing of SK-SV dataset is under strict conditions. SK-SV dataset cannot be used or distributed for any commercial purposes. It can only be used for academic and research purposes. SK-SV dataset is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). Under this license, if you want to modify SK-SV or generate new data from SK-SV dataset (e.g., super-resolution, denoising, defocus), the releasing of your new data should be licensed under the same CC BY-NC-SA 4.0. If you need more help or information, please contact: {j.jing23; ye.mao21}@imperial.ac.uk.

BibTeX


  @article{jing2024matchstereovideos,
    title={Match Stereo Videos via Bidirectional Alignment}, 
    author={Junpeng Jing and Ye Mao and Anlan Qiu and Krystian Mikolajczyk},
    year={2024},
    eprint={2403.10755},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
    }