Comparsion with existing methods
Video stereo matching is the task of estimating consistent disparities from stereo videos. There is considerable scope for improvement in both datasets and methods within this field. Recent learning-based methods often focus on optimizing performance for single stereo pairs, leading to temporal inconsistencies. Existing video methods typically employ sliding window operation over time, which can result in low-frequency oscillations corresponding to the window size. To address these challenges, we propose a bidirectional alignment mechanism for adjacent frames as a fundamental operation. Building on this, we introduce a novel video processing framework, BiDAStereo, and a stabilizer network plugin, BiDAStabilizer, which is compatible with general per-frame methods. Regarding datasets, current synthetic object-based and indoor datasets are commonly used for training and benchmarking, with a lack of outdoor datasets. To bridge this gap, we present a realistic synthetic dataset and benchmark focused on natural scenes, along with a real-world dataset captured by a stereo camera in diverse scenarios for qualitative evaluation. Extensive experiments on in-domain, out-of-domain, and robustness evaluation demonstrate the contribution of our methods and datasets, showcasing improvements in prediction quality and achieving state-of-the-art results on various commonly used benchmarks.
@ARTICLE{11458764,
author={Jing, Junpeng and Mao, Ye and Qiu, Anlan and Mikolajczyk, Krystian},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Match Stereo Videos Via Bidirectional Alignment},
year={2026},
volume={},
number={},
pages={1-16},
doi={10.1109/TPAMI.2026.3679033}}